AI Inference Performance Engineer - New College Grad 2026

Nvidia

Quick summary

Work type
On-site
Location
Santa Clara, CA
Salary
$124,000–$195,500 / yr
Posted
3 days ago

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $211k
This role $160k
$109k most similar roles pay here $265k

This role pays less than 86% of similar roles. Most pay $174,920–$246,150 — the shaded band above. At the midpoint, this role pays about $160k versus about $211k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 563 open roles on FindRole.

Listed pay typically runs $168,000–$264,500 across 556 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · AI Inference Performance Engineer - New College Grad 2026

As a senior performance engineer on NVIDIA’s DL Architecture team, you will drive industry benchmark results by optimizing and integrating quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM. You’ll define cutting-edge workloads, architect distributed inference from single-GPU to rack-scale clusters, establish performance methodologies using roofline analysis, and contribute to open-source projects while influencing GPU roadmaps based on real workload data. This role requires 2+ years of software development experience with Python or C++, expertise in deep learning frameworks like PyTorch, proven track records in delivering measurable performance improvements, and extensive knowledge of LLM/VLM architectures and inference mechanics. Additionally, you should have prior experience with DL compilers, scale-out inference orchestration, kernel development, and leading high-impact technical programs under tight deadlines.

What you'll do

  • Drive end-to-end optimization pipeline for GenAI inference on NVIDIA accelerators.
  • Define and optimize cutting-edge workloads for large-scale LLM-MoE models.
  • Architect distributed inference systems from single-GPU to rack-scale clusters.
  • Establish performance methodology using roofline analysis and systematic profiling.
  • Influence GPU roadmaps by contributing to open-source projects like TensorRT-LLM.

What we're looking for

  • 2+ years of software development experience with Python or C++.
  • Expertise in deep learning frameworks like PyTorch or JAX.
  • Proven ability to deliver measurable performance improvements in DL inference.
  • Deep understanding of LLM/VLM architectures, including attention mechanisms and batching strategies.
  • Experience with large-scale GPU clusters and scale-out inference orchestration tools.
  • Strong background in kernel development for GPUs (CUDA) and compiler/runtime paths.
  • Track record of leading high-impact technical projects across multiple teams under tight deadlines.

More like this

Similar roles

AI Inference Performance Engineer

Nvidia

Santa Clara, CA 89 days ago $152,000$241,500
Python C++ PyTorch JAX TensorRT-LLM vLLM SGLang CUDA MPI NCCL K8s CUTLASS cuteDSL tilelang OpenAI_Triton torch.compile GPU FPGA roofline_analysis performance_profiling
Hybrid

Senior AI Machine Learning Engineer

The Hartford

Chicago, IL 18 days ago $117,200$175,800
AWS GCP SageMaker Streamlit Python Java C# Hadoop Spark Redshift Snowflake BigQuery Jenkins Terraform GitHub GitHub Actions Apache Airflow Kubernetes Docker SQL CI/CD MLOps
Hybrid

Applied AI Engineer

Booz Allen Hamilton

Fort Belvoir, VA 22 days ago $99,000$225,000
Python FastAPI Flask Streamlit Gradio React TypeScript Kubernetes CI/CD Prometheus Grafana MLOps Docker PostgreSQL AWS Azure Google Cloud Platform

Applied AI Engineer

Apple Inc

Cupertino, CA 24 days ago $181,100$272,100
Python FastAPI LangChain LLMs GenAI RESTful APIs Vector databases Async programming Pipeline orchestration Prometheus OpenTelemetry Redis RabbitMQ Kafka Docker CI/CD