Senior Data Center Performance Engineer - Benchmarking and Optimization

Nvidia

Remote Actively hiring Verified listing
Remote, USA · Santa Clara, CA Posted 10 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

Join NVIDIA’s high-impact team as a Senior Performance Engineer to lead benchmarking and optimization efforts for cutting-edge data center platforms, including DGX and HGX systems. Your role will involve designing comprehensive performance strategies, characterizing large-scale workloads, and building automation tools for monitoring and analysis. You’ll collaborate closely with cross-functional teams to identify and resolve bottlenecks across various subsystems, driving improvements through system tuning and architectural recommendations. Ideal candidates have an M.S. or Ph.D., 8+ years of experience in performance engineering, expertise in GPU computing, HPC networking, and proficiency in Python, C++, and shell scripting. Experience with AI/ML frameworks, containerization tools, and cloud provisioning is a plus, as you’ll work on industry-leading solutions that power NVIDIA’s enterprise and cloud provider businesses.

Skills

Python CUDA C++ Linux_perf Nsight_Systems PyTorch TensorFlow JAX MPI NCCL Docker Kubernetes SLURM NVIDIA_NVGX NVIDIA_HGX InfiniBand RoCE NVLink

What you'll do

  • Design and execute comprehensive benchmarking strategies for data center platforms.
  • Characterize real-world AI training, inference, and HPC workloads at scale.
  • Define, track, and report key performance indicators for system efficiency.
  • Build automation tools for performance monitoring and analysis.
  • Identify and resolve performance bottlenecks across various subsystems.

What we're looking for

  • M.S. or Ph.D. in Computer Science, Electrical Engineering, or related field.
  • 8+ years of experience in performance engineering or system architecture.
  • Deep understanding of computer architecture and hardware-software interaction.
  • Proficiency in performance profiling tools like Linux perf and NVIDIA Nsight Systems.
  • Strong background in GPU computing, parallel programming (CUDA), and HPC networking technologies.
  • Programming skills in Python, C++, and shell scripting; excellent analytical and problem-solving abilities.
  • Experience with AI/ML frameworks (PyTorch, TensorFlow) and distributed training/inference.

Market check

Salary context

This $184,000–$287,500 range sits above 86% of similar postings on FindRole.

Peer median band

$139,700$225,850

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$142,400$226,800

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Data Center System Architect

Nvidia

Us, Ca, Santa Clara, US 141 days ago $184,000$287,500
NVIDIA GPU AI HPC Datacenter Linux CI/CD Python C++ CUDA Docker Kubernetes AWS Azure Google Cloud Terraform Ansible Prometheus Grafana PostgreSQL MESOS Kafka

Data Center Engineering - Post Silicon Power and Performance Engineer

Qualcomm

Austin, Tx,Us, US 22 days ago $122,500$183,700
Python C/C++ ARMv8 ARMv9 SMMU GIC Coresight-PMU Timer_architecture Performance_analysis_tools Power_analysis_methods DAQs Protocol_logic_analyzers Oscilloscopes Data_center_servers CPU_memory_hierarchy System_interconnects

Data Center Engineer, Senior

Qualcomm

Atlanta, Ga,Us, US 162 days ago
Commscope iMvision Linux Windows OSX Microsoft Office Suite Outlook Word Excel iLO Terraform Ansible Puppet Chef Jira Confluence Prometheus Grafana Python Bash Cisco Ruckus VMware Docker Kubernetes AWS Azure Google Cloud Platform PostgreSQL MySQL MongoDB Git GitHub Bitbucket CI/CD

Senior HPC Performance Engineer

Nvidia

Remote (Us, Or, Remote, US) 43 days ago $184,000$287,500
Fortran C C++ OpenACC OpenMP MPI CUDA Performance_analysis Parallel_programming Linear_algebra Numerical_methods Assembly_language Debugging Porting
Remote

Data Center Operating Engineer

JLL (Jones Lang LaSalle)

Remote (Usa-Client Totowa Nj, US) 31 days ago $100,380$100,380
Universal CFC certification HVAC electronics building automation systems UPS systems preventative maintenance programs SOP-driven operations volt meters drain augers plumbing tools safety goggles ear protection fire extinguishers algebra geometry load balancing practical problem solving
Remote