HPC Systems Administrator

Argonne National Laboratory

Lemont, IL USA, United States of America Posted 9 hours, 46 minutes ago

$106455 - $166069/year

Job Description

We are seeking a highly skilled and motivated HPC Systems Administrator to manage and support our high-performance computing (HPC) environment. The role involves maintaining and optimizing four unique HPC clusters, Globus data transfer nodes, GPU nodes, monitoring systems, IBM ESS storage appliances, GPFS (General Parallel File System), PBS Pro scheduler, and ensuring compliance with security and identity management standards such as LDAP integration, Multi-Factor Authentication (MFA), and HSPD-12 compliance. The ideal candidate will ensure the reliability, performance, and scalability of our HPC infrastructure to support advanced computational workloads.

Key Responsibilities

HPC Cluster Management:

  • Administer and maintain four unique HPC clusters, ensuring optimal performance and uptime.
  • Perform system upgrades, patching, and configuration management.
  • Troubleshoot and resolve hardware and software issues.

Data Transfer Nodes & Globus:

  • Manage Globus data transfer nodes to facilitate efficient and secure data movement.
  • Monitor and optimize data transfer performance across the clusters.

GPU Nodes Administration:

  • Configure and maintain GPU nodes for computational workloads.
  • Optimize GPU utilization for machine learning, AI, and other GPU-intensive applications.

Monitoring & Visualization:

  • Implement and maintain monitoring tools such as Grafana to track system health and performance.
  • Develop dashboards and alerts for proactive issue resolution.

Storage Management:

  • Administer IBM ESS storage appliances and GPFS (Spectrum Scale) to ensure high availability and performance.
  • Monitor storage usage and plan for capacity upgrades as needed.

Job Scheduling:

  • Manage and optimize PBS Pro scheduler for efficient job queuing and resource allocation.
  • Troubleshoot scheduling issues and implement policies to improve throughput.

Identity & Access Management:

  • Implement and manage LDAP integration for centralized authentication and directory services.
  • Administer Linux account management, including user provisioning, permissions, and access controls.
  • Configure and support Multi-Factor Authentication (MFA) solutions to enhance system security.
  • Ensure compliance with HSPD-12 standards for identity verification and access control.

Documentation & Reporting:

  • Maintain detailed documentation of system configurations, processes, and procedures.
  • Generate regular reports on system performance, utilization, and incidents.

Collaboration & Support:

  • Work closely with researchers, developers, and other stakeholders to understand their computational needs.
  • Provide technical support and training to users of the HPC systems.

Security & Compliance:

  • Implement security best practices to protect sensitive data and computational resources.
  • Ensure compliance with organizational policies, industry standards, and government regulations such as HSPD-12.

May be required to perform other duties as assigned.

Position Requirements

  • Minimum Education and Experience Requirements: Bachelors and 6+ years’ experience, Masters and 4+ years’ experience, or equivalent
  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • 5+ years of experience in HPC systems administration or a similar role.
  • Proficiency in Linux/Unix system administration.
  • Experience with Globus, GPU nodes, and HPC cluster management.
  • Strong knowledge of IBM ESS storage appliances and GPFS.
  • Familiarity with PBS Pro scheduler and job queuing systems.
  • Expertise in LDAP integration, Linux account management, and Multi-Factor Authentication (MFA).
  • Hands-on experience with monitoring tools like Grafana.
  • Knowledge of HSPD-12 compliance requirements and implementation.
  • Excellent problem-solving and analytical skills.
  • Ability to work independently and manage multiple priorities.
  • Attention to detail and commitment to quality.
  • Ability to model Argonne’s core values of impact, safety, respect, integrity, and teamwork.
  • Interpersonal skills, oral and written communication skills, and ability to interact with people at all levels both within and outside the laboratory.

Preferred Knowledge, Skills, and Experience

  • Master's degree in a relevant field.
  • Certifications in HPC, Linux, or storage technologies.
  • Experience with scripting languages (e.g., Python, Bash) for automation.
  • Knowledge of networking protocols and security practices.

Work Environment

  • Office-based with occasional on-site work at HPC facilities.
  • May require after-hours support for critical issues.

Job Family

Professional Technical (PT)

Job Profile

Systems Integration Admin/Support 4

Worker Type

Regular

Time Type

Full time

The expected hiring range for this position is $106,455.00 - $166,069.80.

Please note that the pay range information is a general guideline only. The pay offered to a selected candidate will be determined based on factors such as, but not limited to, the scope and responsibilities of the position, the qualifications of the selected candidate, business considerations, internal equity, and external market pay for comparable jobs. Additionally, comprehensive benefits are part of the total rewards package.

Click here to view Argonne employee benefits!

As an equal employment opportunity employer, and in accordance with our core values of impact, safety, respect, integrity and teamwork, Argonne National Laboratory is committed to a safe and welcoming workplace that fosters collaborative scientific discovery and innovation. Argonne encourages everyone to apply for employment. Argonne is committed to nondiscrimination and considers all qualified applicants for employment without regard to any characteristic protected by law.

Argonne employees, and certain guest researchers and contractors, are subject to particular restrictions related to participation in Foreign Government Sponsored or Affiliated Activities, as defined and detailed in United States Department of Energy Order 486.1A. You will be asked to disclose any such participation in the application phase for review by Argonne's Legal Department.

All Argonne offers of employment are contingent upon a background check that includes an assessment of criminal conviction history conducted on an individualized and case-by-case basis.  Please be advised that Argonne positions require upon hire (or may require in the future) for the individual be to obtain a government access authorization that involves additional background check requirements.  Failure to obtain or maintain such government access authorization could result in the withdrawal of a job offer or future termination of employment.

For more details click Job Post.

About Argonne National Laboratory

Argonne National Laboratory is a multidisciplinary science and engineering research center sponsored by the U.S. Department of Energy, conducting research in energy, environment, and national security. Industry: Scientific Research & National Laboratories