AI Inference Performance Engineer

Nvidia

Hybrid

Quick summary

Work type
Hybrid
Location
Santa Clara, CA
Salary
$152,000–$241,500 / yr
Posted
89 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $211k
This role $197k
$139k most similar roles pay here $271k

This role pays less than 65% of similar roles. Most pay $175,740–$246,150 — the shaded band above. At the midpoint, this role pays about $197k versus about $211k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 563 open roles on FindRole.

Listed pay typically runs $168,000–$264,500 across 556 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · AI Inference Performance Engineer

As a senior performance engineer on NVIDIA’s DL Architecture team, you will drive industry benchmark results by optimizing end-to-end inference pipelines for TensorRT-LLM, SGLang, and vLLM, focusing on quantization, scheduling, memory management, and distributed inference. You’ll define cutting-edge benchmarks, collaborate with framework teams to enhance performance on large-scale models, architect distributed systems from single-GPU to rack-scale clusters, and establish robust performance methodologies using roofline analysis and profiling tools. Additionally, you will contribute to open-source projects, influence GPU roadmaps, and lead a high-impact technical team under tight deadlines. This role requires expertise in Python or C++, deep learning frameworks like PyTorch, experience with large language models and vision-language workloads, and proficiency in CUDA programming and kernel development.

What you'll do

  • Drive end-to-end optimization pipeline for GenAI inference on NVIDIA accelerators.
  • Define and optimize next-generation AI workloads and benchmarks across various models.
  • Architect distributed inference systems from single-GPU to rack-scale clusters.
  • Apply roofline analysis and profiling to identify performance bottlenecks in CUDA kernels.
  • Contribute to open-source projects like TensorRT-LLM, vLLM, and SGLang for GPU optimization.

What we're looking for

  • 5+ years of software development experience with Python or C++
  • Expertise in deep learning frameworks like PyTorch or JAX
  • Proven ability to deliver measurable performance improvements in DL inference
  • Deep understanding of LLM/VLM architectures and inference mechanics
  • Experience with large-scale GPU clusters and scale-out inference orchestration
  • Expertise in kernel development for GPUs (CUDA, CUTLASS) and compiler/runtime paths
  • Track record of leading high-impact technical programs across teams under tight deadlines

More like this

Similar roles

AI Inference Performance Engineer - New College Grad 2026

Nvidia

Santa Clara, CA 3 days ago $124,000$195,500
Python C++ PyTorch JAX TensorRT-LLM vLLM SGLang CUDA CUTLASS cuteDSL tilelang OpenAI_Triton torch.compile MPI NCCL K8s roofline_analysis performance_profiling GPU_programming deep_learning_inference

Applied AI Engineer

Booz Allen Hamilton

Fort Belvoir, VA 22 days ago $99,000$225,000
Python FastAPI Flask Streamlit Gradio React TypeScript Kubernetes CI/CD Prometheus Grafana MLOps Docker PostgreSQL AWS Azure Google Cloud Platform

Applied AI Engineer

Apple Inc

Cupertino, CA 24 days ago $181,100$272,100
Python FastAPI LangChain LLMs GenAI RESTful APIs Vector databases Async programming Pipeline orchestration Prometheus OpenTelemetry Redis RabbitMQ Kafka Docker CI/CD

Senior AI Inference Compiler Engineer

Nvidia

Remote (Santa Clara, CA) 102 days ago $152,000$241,500
MLIR XLA LLVM PyTorch GPU CUDA C++ Compiler Technologies Deep Learning Models LLM Inference Optimizations High Performance Computing Fast Build Time Kernel Generation Neural Networks Software Engineering
Remote

AI/ML Engineer

Lam Research

Fremont, CA 64 days ago $119,000$261,000
Python C++ PostgreSQL SQLite MySQL Git Domain-Driven Design Test-Driven Development CI/CD
Hybrid

AI/ML Engineer

Booz Allen Hamilton

Norfolk, VA 3 days ago
Spark Hadoop Databricks Python Java Scala R TensorFlow Keras PyTorch CI/CD MLOps Git Jupyter Notebook PostgreSQL MongoDB AWS Azure Google Cloud Platform Kubernetes Docker