AI Inference Performance Engineer
Nvidia
Quick summary
Market check
How this pay compares to similar roles
This role pays less than 86% of similar roles. Most pay $174,920–$246,150 — the shaded band above. At the midpoint, this role pays about $160k versus about $211k for comparable roles.
Based on 240 similar postings.
Employer
Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing
Nvidia currently has 563 open roles on FindRole.
Listed pay typically runs $168,000–$264,500 across 556 roles with salary data.
Most-posted roles
At a glance
As a senior performance engineer on NVIDIA’s DL Architecture team, you will drive industry benchmark results by optimizing and integrating quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM. You’ll define cutting-edge workloads, architect distributed inference from single-GPU to rack-scale clusters, establish performance methodologies using roofline analysis, and contribute to open-source projects while influencing GPU roadmaps based on real workload data. This role requires 2+ years of software development experience with Python or C++, expertise in deep learning frameworks like PyTorch, proven track records in delivering measurable performance improvements, and extensive knowledge of LLM/VLM architectures and inference mechanics. Additionally, you should have prior experience with DL compilers, scale-out inference orchestration, kernel development, and leading high-impact technical programs under tight deadlines.
Skills
What you'll do
What we're looking for
More like this
Nvidia
Nvidia
The Hartford
Booz Allen Hamilton
Booz Allen Hamilton
Apple Inc