Senior Software Engineer - AI Inference

Nvidia

Remote Actively hiring
Remote, USA · Santa Clara, CA Posted 46 days ago $152,000$241,500 / year

At a glance

AI generated

TL;DR

NVIDIA is hiring a Senior Software Engineer – AI Inference to contribute to open-source LLM serving engines like vLLM and SGLang by enhancing their performance on NVIDIA GPUs and systems. The role involves daily tasks such as writing features, optimizations, and tests for these engines, implementing efficient inference runtime capabilities, profiling hot paths across layers from Python orchestration to CUDA kernels, improving multi-GPU performance, and building regression tests. Ideal candidates have 5+ years of experience in production software development with a focus on systems engineering, strong programming skills in Python, C++, and CUDA, expertise in profiling tools like microbenchmarks and flame graphs, and familiarity with distributed systems concepts. Experience with open-source contributions to projects such as vLLM, SGLang, or PyTorch is highly valued, along with a background in building benchmarking infrastructure for latency/throughput.

Skills

Python C++ CUDA vLLM SGLang PyTorch Triton NCCL Dynamo CI/CD GPU InfiniBand Profiling Flamegraphs Microbenchmarks Concurrency Multi-threading Multi-process Kubernetes Docker PostgreSQL

What you'll do

  • Contribute features, fixes, and optimizations upstream to vLLM/SGLang by authoring PRs and participating in reviews.
  • Implement and optimize inference-runtime capabilities like batching policies, streaming, and KV-cache efficiency.
  • Profile and improve hot paths across layers using data-driven optimization techniques.
  • Enhance multi-GPU inference performance through parallelism strategies and communication patterns.
  • Build and maintain regression tests to ensure stable behavior and prevent slowdowns in production.

What we're looking for

  • 5+ years of experience building production software with proven performance or reliability improvements.
  • Strong programming skills in Python, C++, and CUDA, including debugging and optimizing critical code.
  • Experience with LLM inference/serving stacks like vLLM and SGLang, understanding trade-offs for real-world performance.
  • Proficiency in profiling tools and techniques (microbenchmarks, flame graphs, GPU profiling) with a data-driven mindset.
  • Familiarity with distributed systems concepts, concurrency, and multi-GPU/nodes scaling strategies.
  • Open-source contributions to projects like vLLM, SGLang, PyTorch, Triton, NCCL, or similar serving/runtime initiatives.

Market check

Salary context

This $152,000–$241,500 range sits above 45% of similar postings on FindRole.

Peer median band

$168,000$241,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$162,000$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Principal Software Engineer - AI Inference

Nvidia

Remote (Us, Ca, Santa Clara, US) 96 days ago $272,000$431,250
Rust C++ Python CUDA vLLM SGLang GPU Distributed Systems Concurrency Profiling Microbenchmarking Triton NCCL PyTorch InfiniBand CI/CD Open Source Contribution
Remote

Senior Software Engineer, AI Inference Systems

Nvidia

Us, Ca, Santa Clara, US 32 days ago $184,000$287,500
Python C/C++ CUDA Kubernetes Docker Triton PyTorch vLLM SGLang MLIR Linux Go Rust CI/CD AWS GCP Azure Prometheus Grafana GitHub MLOps

Senior Software Engineer - AI Applications

Plaid

San Francisco Hq, US 43 days ago $209,880$289,080
HTML CSS JavaScript LLM GenAI SSE Vector_Databases Embeddings Agent_Orchestration_Frameworks Prompt_Engineering RAG Semantic_Search CI/CD Python Node.js React Docker Kubernetes AWS PostgreSQL

AI Software Engineer, Senior

Booz Allen Hamilton

Locations Laurel, Maryland, US 43 days ago $86,800$198,000
Python Java C++ JavaScript TypeScript LLM-powered developer tools CI/CD DevOps VS Code Kubernetes Docker GitHub GitLab Jenkins Agentic AI frameworks Orchestration systems Cloud services PostgreSQL MongoDB

AI Software Engineer, Senior

Booz Allen Hamilton

US 43 days ago $86,800$198,000
Python Rust Go Scala Java GitLab CI Jenkins Git Linux Docker Podman AWS LocalStack ESXi Ansible Kubernetes SIEM Security+ Linux+

Senior Software Engineer - AI Core Engineering

The Walt Disney Company

Remote (Usa - Ca - 1200 Grand Central Ave, US) 94 days ago $141,900$190,300
Python LLM APIs AWS Bedrock Azure AI Foundry LangChain LangGraph APIs SDKs OpenAI Anthropic Claude Observability Tracing Latency and cost dashboards Drift detection Multi-agent orchestration Synthetic data Enterprise governance Security Compliance Audit Policy enforcement
Remote