AI Inference Performance Engineer - New College Grad 2026

Nvidia

Quick summary

Work type: On-site
Location: Santa Clara, CA
Salary: $124,000–$195,500 / yr
Posted: 3 days ago

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $211k

This role $160k

$109k most similar roles pay here $265k

This role pays less than 86% of similar roles. Most pay $174,920–$246,150 — the shaded band above. At the midpoint, this role pays about $160k versus about $211k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 563 open roles on FindRole.

Listed pay typically runs $168,000–$264,500 across 556 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · AI Inference Performance Engineer - New College Grad 2026

Apply Now Log in to save

As a senior performance engineer on NVIDIA’s DL Architecture team, you will drive industry benchmark results by optimizing and integrating quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM. You’ll define cutting-edge workloads, architect distributed inference from single-GPU to rack-scale clusters, establish performance methodologies using roofline analysis, and contribute to open-source projects while influencing GPU roadmaps based on real workload data. This role requires 2+ years of software development experience with Python or C++, expertise in deep learning frameworks like PyTorch, proven track records in delivering measurable performance improvements, and extensive knowledge of LLM/VLM architectures and inference mechanics. Additionally, you should have prior experience with DL compilers, scale-out inference orchestration, kernel development, and leading high-impact technical programs under tight deadlines.

Skills

Python C++ PyTorch JAX TensorRT-LLM vLLM SGLang CUDA CUTLASS cuteDSL tilelang OpenAI_Triton torch.compile MPI NCCL K8s roofline_analysis performance_profiling GPU_programming deep_learning_inference

What you'll do

Drive end-to-end optimization pipeline for GenAI inference on NVIDIA accelerators.
Define and optimize cutting-edge workloads for large-scale LLM-MoE models.
Architect distributed inference systems from single-GPU to rack-scale clusters.
Establish performance methodology using roofline analysis and systematic profiling.
Influence GPU roadmaps by contributing to open-source projects like TensorRT-LLM.

What we're looking for

2+ years of software development experience with Python or C++.
Expertise in deep learning frameworks like PyTorch or JAX.
Proven ability to deliver measurable performance improvements in DL inference.
Deep understanding of LLM/VLM architectures, including attention mechanisms and batching strategies.
Experience with large-scale GPU clusters and scale-out inference orchestration tools.
Strong background in kernel development for GPUs (CUDA) and compiler/runtime paths.
Track record of leading high-impact technical projects across multiple teams under tight deadlines.

Similar roles

AI Inference Performance Engineer

Nvidia

Santa Clara, CA 89 days ago $152,000–$241,500

Python C++ PyTorch JAX TensorRT-LLM vLLM SGLang CUDA MPI NCCL K8s CUTLASS cuteDSL tilelang OpenAI_Triton torch.compile GPU FPGA roofline_analysis performance_profiling

Hybrid

Save

Research Scientist, Fundamental Generative AI - New College Grad 2026

Nvidia

Santa Clara, CA 123 days ago $168,000–$264,500

Python PyTorch CUDA C++ DeepLearning GenerativeAI MolecularDesign ProteinDesign RNADesign ScientificDataAnalysis MachineLearning ResearchPublication CollaborationTools

Save

Senior AI Machine Learning Engineer

The Hartford

Chicago, IL 18 days ago $117,200–$175,800

AWS GCP SageMaker Streamlit Python Java C# Hadoop Spark Redshift Snowflake BigQuery Jenkins Terraform GitHub GitHub Actions Apache Airflow Kubernetes Docker SQL CI/CD MLOps

Hybrid

Save

Artificial Intelligence and Machine Learning Engineer, Mid

Booz Allen Hamilton

McLean, VA 11 days ago $77,600–$176,000

Python scikit-learn PyTorch TensorFlow Databricks Palantir Spark AWS GovCloud API-first event-driven CI/CD MLOps explainability fairness DevSecOps FISMA ATO Amazon Bedrock

Save

Applied AI Engineer

Booz Allen Hamilton

Fort Belvoir, VA 22 days ago $99,000–$225,000

Python FastAPI Flask Streamlit Gradio React TypeScript Kubernetes CI/CD Prometheus Grafana MLOps Docker PostgreSQL AWS Azure Google Cloud Platform

Save

Applied AI Engineer

Apple Inc

Cupertino, CA 24 days ago $181,100–$272,100

Python FastAPI LangChain LLMs GenAI RESTful APIs Vector databases Async programming Pipeline orchestration Prometheus OpenTelemetry Redis RabbitMQ Kafka Docker CI/CD

Save