Senior HPC Support Engineer, Compute and GPU Platform

Nvidia

Remote

Quick summary

Work type: Remote
Location: Santa Clara, CA
Salary: $108,000–$172,500 / yr
Posted: 3 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $201k

This role $140k

$92k most similar roles pay here $258k

This role pays less than 89% of similar roles. Most pay $167,224–$235,750 — the shaded band above. At the midpoint, this role pays about $140k versus about $201k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 942 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 931 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior HPC Support Engineer, Compute and GPU Platform

Role Posting Log in to save

As a Senior HPC Support Engineer specializing in Compute/GPU (DGX Platform) within our Technical Support team, you will be responsible for resolving complex customer issues related to AI hardware and software products. Your day-to-day tasks include debugging and troubleshooting technical problems via various communication channels, ensuring high customer satisfaction through meticulous research and problem-solving. You will collaborate closely with engineering, marketing, and support teams to provide feedback on product requirements and improve support processes. The ideal candidate has over five years of experience in customer support and debugging, proficiency in Linux system administration across multiple distributions like Red Hat Enterprise Linux and Ubuntu, and expertise in AI tools and technologies such as Docker, Kubernetes, and deep learning frameworks. Additionally, a strong background in networking, data centers, and distributed systems is essential for this role, which operates at the intersection of cutting-edge technology and customer service excellence.

Skills

Linux RedHatEnterpriseLinux Ubuntu Docker Kubernetes Python Bash InfiniBand RDMA RCEv2 GPU MPI NCCL ShellScripting Ethernet DistributedFileSystemStorage NetworkSwitchRouter OpenPlatforms HPC Clustering

What you'll do

Resolve complex customer issues on DGX Platform through detailed research and problem-solving.
Debug and respond to user-reported technical problems via various communication channels.
Develop and document standard methodologies for internal teams based on customer issue analysis.
Regularly interact with engineering, marketing, and support teams to provide feedback on product requirements.
Apply AI tools efficiently to share debugging results and create knowledge base articles.

What we're looking for

5+ years of experience in customer support and debugging for hardware and software products.
Proven use of AI technologies in daily job responsibilities.
Deep understanding of Linux system administration on Red Hat Enterprise Linux and Ubuntu distributions.
Strong knowledge of at least two areas: data centers, servers, distributed systems, virtualization, deep learning frameworks, or containers/containerization.
Proficiency in shell scripting (Bash/Python) and networking technologies including InfiniBand, RDMA/RoCEv2, and GPU technology.
Experience with clustering/HPC data-center technologies, upper layer protocols like MPI, NCCL, and Ethernet/Distributed File System Storage technologies.

Similar roles

HPC Operations Engineer

Nvidia

Santa Clara, CA 81 days ago $124,000–$195,500

Centos RHEL Docker Python bash Ansible NFS LDAP DNS TCP/IP SLURM FlexLM Perl InfiniBand RDMA RoCE Lustre GPFS

Hybrid

Save

Senior HPC Cluster Engineer

Nvidia

Santa Clara, CA +2 13 days ago $152,000–$241,500

Slurm Kubernetes Python Bash Docker Enroot Prometheus Grafana Linux RHEL Ubuntu NVIDIA_GPUs CUDA NCCL MPI InfiniBand RDMA RoCE Lustre GPFS Ansible MLPerf

Save

Senior Solutions Architect, AI Compute

Nvidia

Remote (CA) +4 22 days ago $184,000–$287,500

Linux Bash Python Ansible SLURM Kubernetes InfiniBand MPI Lustre GPFS HPL NCCL MLPerf NVIDIA BCM CI/CD

Remote

Save

Senior AI Compute Engineer

Nvidia

Remote (Santa Clara, CA) 76 days ago $148,000–$235,750

Linux Bash Python Ansible SLURM LSF UGE Kubernetes HPL NCCL MLPerf InfiniBand MPI Lustre GPFS BCM Terraform CI/CD

Remote

Save

Software DevOps Engineer, Networking

Nvidia

Santa Clara, CA 23 days ago $148,000–$235,750

Linux Docker Python C++ Bash GitLab CI/CD Ubuntu RHEL InfiniBand Ethernet gRPC gNMI REST JSON High-Speed Communication GPU Networking Firmware Drivers

Save

Senior HPC Performance Engineer

Nvidia

Remote (OR) +3 72 days ago $184,000–$287,500

Fortran C C++ OpenACC OpenMP MPI CUDA Performance_analysis Parallel_programming Linear_algebra Numerical_methods Assembly_language Debugging Porting

Remote

Save