Engineering Manager, LLM Performance

Nvidia

Hybrid

Quick summary

Work type: Hybrid
Location: Santa Clara, CA
Salary: $224,000–$356,500 / yr
Posted: 5 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $207k

This role $290k

$143k most similar roles pay here $379k

This role pays more than 94% of similar roles. Most pay $169,780–$244,070 — the shaded band above. At the midpoint, this role pays about $290k versus about $207k for comparable roles.

Based on 239 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 942 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 931 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Engineering Manager, LLM Performance

Role Posting Log in to save

At NVIDIA, join as an Engineering Manager leading a high-impact team focused on accelerating large language model (LLM) and vision language model (VLM) inference across open-source frameworks like TensorRT LLM, vLLM, and SGLang. You will architect and guide the development of performance-critical features for current and future NVIDIA datacenter products, collaborating closely with researchers and GPU architects to deliver cutting-edge software that sets global standards in AI performance. This role requires a strong background in C++ or Python, expertise in LLM inference, and deep knowledge of GPU architecture and CUDA programming. Ideal candidates have 7+ years of software engineering experience, including 3+ years in technical leadership roles, with proven success in managing distributed teams and delivering production-quality software libraries.

Skills

Python C++ CUDA TensorRT-LLM vLLM SGLang GPU Linux CI/CD MLOps Docker Kubernetes Git Jenkins Prometheus Grafana

What you'll do

Lead and grow a team to enhance LLM inference performance across multiple frameworks.
Drive design and optimization of critical features for LLM inference performance.
Improve LLM inference performance on current and future NVIDIA datacenter GPUs.
Collaborate with benchmark teams to optimize key workloads' performance.
Integrate cutting-edge technologies for intuitive developer experience in LLM deployment.

What we're looking for

MS, PhD, or equivalent experience in Computer Science, AI, or related technical field
7+ years of software engineering experience with 3+ years in technical leadership roles
Proven ability to lead and scale high-performing distributed engineering teams
Expertise in C++ or Python for software design and production-quality libraries
Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning
Background in LLM inference and experience with frameworks like TensorRT-LLM, vLLM, SGLang

Similar roles

Manager, Large Language Model Inference

Nvidia

Remote (Canada) +1 98 days ago $184,000–$287,500

C++ Python TensorRT-LLM vLLM SGLang CUDA GPU Architecture CI/CD Docker Kubernetes GitHub NVIDIA GPUs PostgreSQL MongoDB Git JIRA Confluence Slack Zoom

Remote

Save

Manager, Deep Learning Algorithms

Nvidia

Santa Clara, CA 170 days ago $224,000–$356,500

Python TensorFlow PyTorch Large Language Models (LLMs) Large Visual-Language Models (VLMs) TensorRT-LLM vLLM SGLang JIRA Microsoft Project Git GitHub CI/CD Docker Kubernetes AWS NVIDIA GPU CUDA C++ Linux

Save

Engineering Manager, Inference Benchmarking

Nvidia

Remote (Santa Clara, CA) +4 31 days ago $224,000–$356,500

Kubernetes vLLM TRT-LLM SGLang DCGM PyNVML Prometheus ZMQ Helm CI/CD Python Linux GPU TensorRT MLPerf OpenSource Docker Git GitHub MLOps

Remote

Save