Senior Datacenter Performance Model Engineer

Nvidia

Remote Actively hiring Posted this week

Santa Clara, CA Posted 4 days ago $152,000–$241,500 / year

View original post Log in to save

At a glance

AI generated

TL;DR

Join our dynamic software team as a senior software engineer, where you will develop datacenter-scale performance modeling and prediction tools for AI workloads in GPU clusters. Your responsibilities include building production tools used by multiple teams within NVIDIA and its customers, automating workflows to find optimal configurations across millions of parameters, and collaborating with hardware and software architects to enhance features based on real-world use cases. Ideal candidates possess a BS+ in Computer Science or equivalent experience, along with 5+ years of software development expertise in C++ and Python, strong knowledge of deep learning frameworks like PyTorch and TensorFlow, and familiarity with GPU cluster job scheduling systems such as Slurm or Kubernetes. Additionally, you should have hands-on experience with NVIDIA GPUs, CUDA programming, Linux device drivers, and large-scale AI job performance analysis.

Skills

Python C++ PyTorch TensorFlow Kubernetes Slurm CUDA NVIDIA GPUs Linux Distributed Systems Performance Analysis Data Center Scale Deployment GPU Architecture CPU Architecture Compiler Implementation Deep LearningFrameworks

What you'll do

Build datacenter-scale performance modeling tools for AI workloads.
Develop production workflows for efficient configuration search across millions of parameters.
Create tools used by multiple teams within NVIDIA and its customers.
Collaborate with HW and SW architects to enhance features based on real-world use cases.
Analyze large AI job performance for training and inference workloads.

What we're looking for

BS+ in Computer Science or equivalent with 5+ years of software development experience.
Strong skills in C++, Python, and deep learning frameworks like PyTorch and TensorFlow.
Experience with NVIDIA GPUs, CUDA programming, and large-scale GPU cluster job scheduling.
Proven ability to deploy software at datacenter scale and analyze AI job performance.
Knowledge of Linux device drivers, compiler implementation, and general computer architecture.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 825 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 813 roles with salary data.

Most-posted roles

View all roles at Nvidia