Senior HPC Storage Engineer

Nvidia

Actively hiring
Us, Ca, Santa Clara, US Posted 67 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

As a Senior Storage Architect on the HW Infrastructure Storage Strategy team, you will lead the research and design of innovative storage solutions to support high-performance computing (HPC) workloads. Your day-to-day responsibilities include analyzing existing internal distributed storage services, designing scalable next-gen storage systems, developing automation tooling for large-scale infrastructure management, and collaborating with cross-functional teams to capture infrastructure requirements. You will also influence methodologies for efficient application deployment and optimize deep learning workflows on our clusters. The ideal candidate has over 8 years of experience in large-scale storage infrastructure design and operation, proficiency in Linux distributions like CentOS/RHEL and Ubuntu, Python programming, bash scripting, and container technologies such as Docker and Enroot. Expertise in distributed filesystems (Ceph, Weka.io, Vast, Lustre, GPFS), NVIDIA GPUs, CUDA programming, high-performance networking, and deep learning frameworks like PyTorch and TensorFlow is essential for this role that addresses the scaling and performance challenges of our expanding cloud infrastructure.

Skills

Python Docker Ceph Weka.io Vast Lustre GPFS CUDA NCCL PyTorch TensorFlow Bash CentOS RHEL Ubuntu SDN MLPerf NVIDIA GPUs HDDs SSDs NVMe

What you'll do

  • Research and design scalable distributed storage services for high performance computing workloads.
  • Implement tooling to automate management and monitoring of large-scale infrastructure environments.
  • Conduct technology evaluations related to distributed file systems and storage solutions.
  • Analyze existing internal distributed storage services to optimize performance and cost-effectiveness.
  • Perform root cause analysis and suggest corrective actions for storage-related issues.
  • Influence methodologies for building, testing, and deploying applications for efficient resource utilization.

What we're looking for

  • 8+ years of experience designing and operating large-scale storage infrastructure.
  • Expertise in analyzing and tuning storage performance across various workloads.
  • Proficiency with container technologies like Docker and Enroot.
  • Extensive experience with parallel and distributed filesystems (e.g., Ceph, Weka.io).
  • Deep understanding of GPU technology, including CUDA programming and NCCL.
  • Knowledge of advanced networking concepts for AI/HPC clusters.
  • Practical experience with deep learning frameworks such as PyTorch and TensorFlow.

Market check

Salary context

This $184,000–$287,500 range sits above 88% of similar postings on FindRole.

Peer median band

$144,000$228,000

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$142,400$219,218

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior HPC Storage Architect & Engineer

Lam Research

Fremont, Ca,Us, US 135 days ago $114,000$253,000
Lustre GPFS/Spectrum Scale VAST Data WEKA NetApp ONTAP FlexCache AWS Azure GCP InfiniBand RoCE NVMe-over-Fabrics SLURM xCAT Warewulf Ansible Terraform Python YAML Kubernetes CSI S3 IaC CI/CD

Senior HPC Cluster Engineer

Nvidia

Us, Ca, Santa Clara, US 78 days ago $152,000$241,500
Slurm Kubernetes Python Bash Docker Enroot Prometheus Grafana Linux RHEL Ubuntu MPI NCCL CUDA NVIDIA_GPUs InfiniBand RDMA RoCE Lustre GPFS Ansible MLPerf

Senior Storage Engineer

Pacific Life

Newport Beach Ca-700, US 17 days ago $137,610$168,190
PURE NetApp VMware Brocade SAN fabric switches AWS Azure Google Cloud Hyper Converged Infrastructure CI/CD Linux Windows SAN/NAS architectures Docker Kubernetes

Senior HPC Performance Engineer

Nvidia

Remote (Us, Or, Remote, US) 41 days ago $184,000$287,500
Fortran C C++ OpenACC OpenMP MPI CUDA Performance_analysis Parallel_programming Linear_algebra Numerical_methods Assembly_language Debugging Porting
Remote

Senior HPC Solutions Architect

Nvidia

Remote (Us, Ca, Santa Clara, US) 48 days ago $184,000$287,500
Python C++ CUDA SLURM Linux BMC PCIe Network_Adapters InfiniBand DPU RoCE ARM Linux_Kernel Drivers SDN C
Remote

Senior HPC and LSF Operations Engineer

Nvidia

Us, Ca, Santa Clara, US 78 days ago $152,000$241,500
LSF Slurm Linux CentOS RHEL Docker Singularity Podman HPC Reliability Engineering Metrics Collection Monitoring Pipelines Alerting Strategies Performance Dashboards Container Technologies Job Scheduling Systems