Senior Inference Engineer, AIConfigurator for Dynamo

Nvidia

Remote

Quick summary

Work type: Remote
Location: Santa Clara, CA
Salary: $184,000–$287,500 / yr
Posted: 5 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $206k

This role $236k

$153k most similar roles pay here $302k

This role pays more than 72% of similar roles. Most pay $167,449–$245,112 — the shaded band above. At the midpoint, this role pays about $236k versus about $206k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 980 open roles on FindRole.

Listed pay typically runs $168,000–$270,250 across 966 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Inference Engineer, AIConfigurator for Dynamo

Apply Now Log in to save

NVIDIA is seeking a Senior Inference Engineer to join the AIConfigurator team and enhance its system for discovering high-performance deployment configurations for large-scale LLM inference. The role involves building and evolving the core optimization engine, creating production-quality APIs and SDKs in Python and Rust, and developing backend-specific artifacts for various NVIDIA platforms. Engineers will collaborate with multiple teams to ensure simulated performance matches real-world deployments on GPUs like H100 and H200, while also improving model support through integration of profiling data and validation tools. Ideal candidates have extensive experience in GPU computing, distributed systems, and ML infrastructure, along with strong Python/Rust skills and a deep understanding of LLM inference concepts such as batching and parallelism strategies.

Skills

Python Rust Kubernetes TensorRT-LLM vLLM SGLang Triton Inference Server Dynamo CI/CD GPU computing Distributed systems ML infrastructure High-performance model serving Data-driven performance analysis Benchmarking Optimization NVIDIA GPUs Disaggregated serving Prefill/decode separation KV cache management NCCL NIXL NVSHMEM Expert-parallel MoE inference

What you'll do

Build and evolve AIConfigurator's core optimization engine for LLM serving.
Develop Python/Rust APIs and CLIs to help users generate strong deployment configurations.
Emit backend-specific artifacts for Dynamo, Kubernetes, TensorRT-LLM, vLLM, and SGLang deployments.
Ensure simulated results match actual deployment performance on NVIDIA platforms.
Improve model, hardware, and backend support by integrating various databases and tools.
Convert complex inference ideas into reliable software abstractions.

What we're looking for

10+ years of relevant software engineering experience in production-quality Python/Rust development.
Strong background in GPU computing and distributed systems for high-performance model serving.
Deep understanding of LLM inference concepts including batching, latency, efficiency, and parallelism strategies.
Experience with data-driven performance analysis, benchmarking, simulation, and optimization.
Practical knowledge working directly with TensorRT-LLM, vLLM, SGLang, Triton Inference Server, or comparable platforms.
Ability to collaborate across research, runtime, platform, and customer-facing engineering teams.

Similar roles

Senior Software Engineer - AI Inference

Nvidia

Remote (Santa Clara, CA) 64 days ago $152,000–$241,500

Python C++ CUDA vLLM SGLang PyTorch Triton NCCL Dynamo CI/CD GPU InfiniBand Profiling Flamegraphs Microbenchmarks Concurrency Multi-threading Multi-process Kubernetes Docker PostgreSQL

Remote

Save

Senior System Software Engineer - Dynamo-Triton Inference Server

Nvidia

Remote (Santa Clara, CA) +1 51 days ago $152,000–$241,500

Rust C++ Python TensorRT PyTorch ONNX OpenVINO vLLM TRT-LLM GPU Distributed Systems GitHub CI/CD Kubernetes Prometheus Grafana NVIDIA Triton Inference Server

Remote

Save

Senior Software Engineer, AI Inference Systems

Nvidia

Santa Clara, CA 50 days ago $184,000–$287,500

Python C/C++ CUDA Kubernetes Docker Triton PyTorch vLLM SGLang MLIR Linux Go Rust CI/CD AWS GCP Azure Prometheus Grafana GitHub MLOps

Hybrid

Save

Senior AI Machine Learning Engineer

The Hartford

Chicago, IL +2 29 days ago $117,200–$175,800

AWS GCP SageMaker Streamlit Python Java C# Hadoop Spark Redshift Snowflake BigQuery Jenkins Terraform GitHub GitHub Actions Apache Airflow Kubernetes Docker SQL CI/CD MLOps

Hybrid

Save

Senior Machine Learning Engineer, AI Platform

Adobe

San Jose 36 days ago $211,800–$306,625

Python Java C++ Cloud Infrastructure Distributed Computing Deep Learning Virtual Reality Augmented Reality Artificial Intelligence Robotics Interactive Experiences Large-Scale Computing Frameworks Data Analysis Systems Modeling Environments

Save

Senior Machine Learning Engineer (AI Foundations)

Capital One Financial

McLean, VA +1 8 days ago $161,800–$184,600

Python Scala Java scikit-learn PyTorch Dask Spark TensorFlow Kubernetes AWS CI/CD PostgreSQL Redis Git Jupyter Notebook S3 Snowflake Hadoop Docker

Save