AI Inference Performance Engineer - New College Grad 2026
Nvidia
Quick summary
Market check
How this pay compares to similar roles
This role pays less than 65% of similar roles. Most pay $175,740–$246,150 — the shaded band above. At the midpoint, this role pays about $197k versus about $211k for comparable roles.
Based on 240 similar postings.
Employer
Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing
Nvidia currently has 563 open roles on FindRole.
Listed pay typically runs $168,000–$264,500 across 556 roles with salary data.
Most-posted roles
At a glance
As a senior performance engineer on NVIDIA’s DL Architecture team, you will drive industry benchmark results by optimizing end-to-end inference pipelines for TensorRT-LLM, SGLang, and vLLM, focusing on quantization, scheduling, memory management, and distributed inference. You’ll define cutting-edge benchmarks, collaborate with framework teams to enhance performance on large-scale models, architect distributed systems from single-GPU to rack-scale clusters, and establish robust performance methodologies using roofline analysis and profiling tools. Additionally, you will contribute to open-source projects, influence GPU roadmaps, and lead a high-impact technical team under tight deadlines. This role requires expertise in Python or C++, deep learning frameworks like PyTorch, experience with large language models and vision-language workloads, and proficiency in CUDA programming and kernel development.
Skills
What you'll do
What we're looking for
More like this
Nvidia
Booz Allen Hamilton
Apple Inc
Nvidia
Lam Research
Booz Allen Hamilton