System Administrator Hpc (4 Year Term)

Edmonton, AB, CA, Canada

Job Description

"Be part of something bigger. At Amii, our work pushes the boundaries of what's possible in artificial intelligence. As our HPC System Administrator, you'll design, build, and manage the high-performance computing systems that power groundbreaking research and innovation. This is your chance to shape the future of AI infrastructure and make a lasting impact. Apply today and help us drive discovery forward!".


Robert Craig | Director, IT

About Amii





Alberta Machine Intelligence Institute (Amii) is one of Canada's three main institutes for artificial intelligence (AI) and machine learning, our world-renowned researchers drive fundamental and applied research at the University of Alberta (and other academic institutions), training some of the world's top scientific talent. Our cross-functional teams work collaboratively with Alberta-based businesses and organizations to build AI capacity and translate scientific advancement into industry adoption and economic impact.

About the Role





We are seeking an HPC System Administrator for a

full-time, 4-year term position

. The System Administrator, High Performance Cluster (HPC) is critical to maintaining the stability, security, and performance of our mission-critical infrastructure, enabling our AI researchers and engineers to focus on pioneering innovations that advance the mission of 'AI for good and for all'.



Reporting to the Director, IT, the System Administrator, HPC is responsible for the day-to-day operation and maintenance of the data center's systems infrastructure. This includes servers, storage, network devices, and related software. This role requires a proactive approach to problem-solving, a commitment to best practices, and the ability to work effectively in a fast-paced environment.



The position focuses on achieving excellence in three main accountabilities:


System Maintenance & Optimization Security Management Technical Support HPC Administration & Support



Required Skills / Expertise




Key Responsibilities:




Assists in expanding HPC resources based on user needs and growth projections, and maintain capacity planning models for scalability and performance Build, configure, and maintain

high-performance computing clusters

, including compute, storage, and networking components. Oversees daily operations and maintenance of the High-Performance Computing (HPC) Cluster running on Linux and SLURM, including monitoring system health and performance, and managing job queues and SLURM configurations for optimized scheduling and resource allocation Design, configure, and troubleshoot

high-speed networking

(InfiniBand, Ethernet, VLANs, etc.) to optimize cluster performance. Manages and maintains Linux-based and Windows servers, ensuring high availability, performance, and security by performing regular updates, patches, and backups, while also configuring and managing essential network services such as DNS, DHCP, NFS, and SNMP Assists in the development and maintenance of comprehensive documentation for systems, configurations, procedures, and policies Collaborates with other departments to align IT initiatives with organizational goals Plans, tests, and deploys system upgrades and patches, keeping systems updated with security and performance enhancements, and coordinating maintenance to minimize user impact Monitors system logs and performance metrics to proactively resolve issues, troubleshoot problems with vendors and support teams, and manage monitoring tools for real-time system health Implements and maintains virtualization and containerization solutions (e.g., VMware) to optimize resource use and ensure secure, efficient operation Recommends & updates standard tech packages for staff considering job requirements, latest technology and budget; deployment of tech packages to new staff Monitors system performance, usage, and resource availability, proactively identifying and resolving issues that could impact performance or user experience Collaborates with Researchers and Machine Learning Scientists to understand computational needs and implement solutions that enhance usability, throughput, and system efficiency Administers workload management and scheduling systems (Slurm) to enable efficient resource allocation and job execution across the cluster Drives continuous improvement of HPC services and support models, identifying opportunities to enhance efficiency, usability, and researcher experience Prepares regular reports on system performance, security incidents, and project status for management review Provides technical support to users by assisting with job submissions, troubleshooting issues, and resolving problems Collaborates with researchers, developers and users to optimize HPC applications through performance tuning strategies Evaluates and recommends new hardware and software solutions to enhance HPC capabilities



Qualifications:




Post Secondary Degree in Computer Science, Information Technology, or a related field (Nice to have), equivalent experience will be considered 3+ years of experience in system administration, preferably in a HPC environment. Strong understanding of Linux (e.g., CentOS, RHEL, Ubuntu) and Windows Server operating systems Experience with virtualization technologies (e.g., VMware, Hyper-V) (Nice to have) Knowledge of scripting languages (e.g., Bash, Python, PowerShell) (Nice to have) Insight into HPC hardware components (CPUs, GPUs, memory, interconnects) and how to optimize their use. Familiarity with storage systems (SAN, NAS) and backup/recovery solutions Strong understanding of networking concepts and protocols Ability to assist users in optimizing workflows and provide training on HPC resources and tools.



What you'll love about us




A professional yet casual work environment that encourages the growth and development of your skills. Participate in professional development activities Gain access to the Amii community and events A chance to learn from amazing teammates who support one another to succeed. Competitive compensation, including paid time off and flexible health benefits. A modern office located in downtown Edmonton, Alberta.



How to Apply





If this sounds like the opportunity you've been waiting for, please don't wait for the closing September 25, 2025 to apply - we're excited to add a new member to the Amii team for this role, and the posting may come down sooner than the closing date if we find the right candidate before the posting closes! When sending your application, please send your resume and cover letter indicating why you think you'd be a fit for Amii. In your cover letter, please include one professional accomplishment you are most proud of and why.

Applicants must be legally eligible to work in Canada at the time of application.





Amii is an equal opportunity employer and values a diverse workforce. We encourage applications from all qualified individuals without regard to ethnicity, religion, gender identity, sexual orientation, age or disability. Accommodations for disability-related needs throughout the recruitment and selection process are available upon request. Any information provided by you for accommodations will be kept confidential and won't be used in the selection process.

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD2735229
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Edmonton, AB, CA, Canada
  • Education
    Not mentioned