Senior Software Engineer, Quantized Inference

Nvidia

Quick summary

Work type: On-site
Location: Redmond, WA · Santa Clara, CA
Salary: $152,000–$241,500 / yr
Posted: 99 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $181k

This role $197k

$128k most similar roles pay here $254k

This role pays more than 69% of similar roles. Most pay $145,487–$216,025 — the shaded band above. At the midpoint, this role pays about $197k versus about $181k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 855 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 843 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Software Engineer, Quantized Inference

Apply Now Log in to save

NVIDIA is seeking a Senior Software Engineer to join its team focused on accelerating the deployment of efficient inference recipes for large language models (LLMs). The role involves translating recipe specifications into high-performance code within inference engines like vLLM, TRT-LLM, and SGLang, ensuring that quantized checkpoints serialize correctly for downstream serving. Key responsibilities include implementing quantized and sparse recipes, building benchmarking tools, developing data analysis tooling, and improving developer productivity through CI systems and training infrastructure. The ideal candidate is proficient in Python and C++, with a strong background in software engineering fundamentals, experience with ML accelerators, and familiarity with PyTorch internals or equivalent frameworks. Additionally, candidates should have a track record of contributing to large open-source projects and debugging numerical issues across mixed-precision boundaries.

Skills

Python C++ PyTorch vLLM TRT-LLM SGLang CI HuggingFace Megatron-LM Triton PyTorch_custom_ops autograd ML_accelerators model_compression PTQ QAT structured_sparsity unstructured_sparsity

What you'll do

Implement quantized and sparse recipes in inference engines like vLLM and TRT-LLM.
Ensure correct serialization of quantized checkpoints for downstream serving.
Build prototypes to evaluate recipe performance before full optimization.
Develop data analysis tools for debugging numerical issues in mixed precision.
Improve developer productivity by enhancing CI, build systems, and training infrastructure.

What we're looking for

Proficient in Python and familiar with C++ for software engineering.
Experience implementing quantized and sparse recipes in inference engines.
Strong background in ML accelerators and understanding of execution time impacts.
Familiarity with PyTorch internals or equivalent framework experience.
4+ years in relevant software engineering roles, including open-source contributions.
Ability to debug numerical issues across mixed-precision boundaries.
Deep expertise in model compression techniques like PTQ, QAT, sparsity.

Save