Senior Software Engineer, Quantized Inference

Nvidia

Quick summary

Work type
On-site
Location
Redmond, WA · Santa Clara, CA
Salary
$152,000–$241,500 / yr
Posted
99 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $181k
This role $197k
$128k most similar roles pay here $254k

This role pays more than 69% of similar roles. Most pay $145,487–$216,025 — the shaded band above. At the midpoint, this role pays about $197k versus about $181k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 855 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 843 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Software Engineer, Quantized Inference

NVIDIA is seeking a Senior Software Engineer to join its team focused on accelerating the deployment of efficient inference recipes for large language models (LLMs). The role involves translating recipe specifications into high-performance code within inference engines like vLLM, TRT-LLM, and SGLang, ensuring that quantized checkpoints serialize correctly for downstream serving. Key responsibilities include implementing quantized and sparse recipes, building benchmarking tools, developing data analysis tooling, and improving developer productivity through CI systems and training infrastructure. The ideal candidate is proficient in Python and C++, with a strong background in software engineering fundamentals, experience with ML accelerators, and familiarity with PyTorch internals or equivalent frameworks. Additionally, candidates should have a track record of contributing to large open-source projects and debugging numerical issues across mixed-precision boundaries.

What you'll do

  • Implement quantized and sparse recipes in inference engines like vLLM and TRT-LLM.
  • Ensure correct serialization of quantized checkpoints for downstream serving.
  • Build prototypes to evaluate recipe performance before full optimization.
  • Develop data analysis tools for debugging numerical issues in mixed precision.
  • Improve developer productivity by enhancing CI, build systems, and training infrastructure.

What we're looking for

  • Proficient in Python and familiar with C++ for software engineering.
  • Experience implementing quantized and sparse recipes in inference engines.
  • Strong background in ML accelerators and understanding of execution time impacts.
  • Familiarity with PyTorch internals or equivalent framework experience.
  • 4+ years in relevant software engineering roles, including open-source contributions.
  • Ability to debug numerical issues across mixed-precision boundaries.
  • Deep expertise in model compression techniques like PTQ, QAT, sparsity.

More like this

Similar roles

Senior Software Engineer - AI Inference

Nvidia

Remote (Santa Clara, CA) 51 days ago $152,000$241,500
Python C++ CUDA vLLM SGLang PyTorch Triton NCCL Dynamo CI/CD GPU InfiniBand Profiling Flamegraphs Microbenchmarks Concurrency Multi-threading Multi-process Kubernetes Docker PostgreSQL
Remote

Senior Software Engineer, AI Inference Systems

Nvidia

Santa Clara, CA 37 days ago $184,000$287,500
Python C/C++ CUDA Kubernetes Docker Triton PyTorch vLLM SGLang MLIR Linux Go Rust CI/CD AWS GCP Azure Prometheus Grafana GitHub MLOps
Hybrid

Senior Deep Learning Software Engineer, Inference

Nvidia

Remote (Santa Clara, CA) 30 days ago $184,000$287,500
C++ Python CUDA NCCL NVSHMEM OAI_TRITON CUTLASS PyTorch vLLM SGLang FlashInfer Multi-GPU_Communications Deep_Learning_Frameworks Performance_Optimization GPU_Acceleration
Remote

Senior Deep Learning Software Engineer

Nvidia

Santa Clara, CA 41 days ago $224,000$356,500
Python PyTorch JAX CUDA TensorRT NVIDIA_TensorRT_LLM GPU_optimization CUTLASS Triton Deep_learning_frameworks Performance_analysis GPU_architecture High_performance_computing Model_inference Inference_optimization
Hybrid

Senior Software Engineer, Machine Learning Inference

Nvidia

Santa Clara, CA 55 days ago $152,000$241,500
C++ Python CUDA Rust TensorRT TensorRT-LLM vLLM SGLang PyTorch JAX Deep Learning Frameworks GPU Programming Performance Analysis Optimization Techniques CI/CD
Hybrid