Principal Software Engineer - AI Inference

Nvidia

Remote Actively hiring
Remote, USA · Santa Clara, CA Posted 96 days ago $272,000$431,250 / year

At a glance

AI generated

TL;DR

NVIDIA is seeking a Principal Software Engineer to lead the advancement of open-source LLM serving technologies like vLLM and SGLang, ensuring they excel on NVIDIA GPUs. This role involves hands-on development to enhance high-throughput, low-latency inference at scale by building features that improve efficiency and tail behavior, optimizing core hot paths, and improving multi-GPU and multi-node performance. The ideal candidate will have extensive experience in systems engineering, particularly with LLM inference/serving systems, and strong programming skills in Rust, C++, Python, and CUDA. They should also possess expertise in GPU performance analysis tools, distributed systems, and open-source contributions to projects like vLLM or SGLang. This position requires a deep understanding of the challenges in large-scale AI infrastructure and the ability to mentor senior engineers while raising the technical bar within NVIDIA.

Skills

Rust C++ Python CUDA vLLM SGLang GPU Distributed Systems Concurrency Profiling Microbenchmarking Triton NCCL PyTorch InfiniBand CI/CD Open Source Contribution

What you'll do

  • Drive upstream-first engineering in vLLM/SGLang by authoring and landing PRs.
  • Build features to improve inference runtime efficiency, latency, and tail behavior.
  • Optimize core hot paths across the stack for better GPU performance.
  • Enhance multi-GPU and multi-node inference through improved communication patterns.
  • Strengthen system correctness, robustness, and operability with observability hooks.
  • Mentor senior engineers and establish guidelines for upstream contribution workflows.

What we're looking for

  • 15+ years of experience in production software systems engineering with a track record of solving complex technical problems.
  • Expertise in LLM inference/serving systems like vLLM and SGLang, understanding the trade-offs affecting real-world performance.
  • Proficiency in Rust, C++, Python, CUDA for reading, modifying, and optimizing performance-critical code across layers.
  • Experience with GPU performance analysis tools and methodologies including profiling, microbenchmarking, and memory/communication analysis.
  • Strong background in distributed systems and concurrency, including queues/schedulers, RPC/streaming, and multi-process/multi-threaded runtime behavior.
  • Substantial open-source contributions to vLLM, SGLang, PyTorch, Triton, NCCL or related GPU/inference infrastructure; maintainer experience preferred.

Market check

Salary context

This $272,000–$431,250 range sits above 99% of similar postings on FindRole.

Peer median band

$153,600$241,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$164,625$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Software Engineer - AI Inference

Nvidia

Remote (Us, Ca, Santa Clara, US) 46 days ago $152,000$241,500
Python C++ CUDA vLLM SGLang PyTorch Triton NCCL Dynamo CI/CD GPU InfiniBand Profiling Flamegraphs Microbenchmarks Concurrency Multi-threading Multi-process Kubernetes Docker PostgreSQL
Remote

AI Software Engineer

Broadcom

Usa-Ga-Atlanta - Perimeter, US 38 days ago $108,000$172,800
Java Spring GitHub Git GitHubActions CI/CD Micrometer OpenTelemetry LargeLanguageModels LLMs VectorDatabases Langchain4J Embable Anthropic OpenAI AmazonBedrock GoogleGenAI AzureOpenAI TanzuPlatform10 Bitnami SpringAI

AI Software Engineer

Booz Allen Hamilton

Locations Arlington, Virginia, US 59 days ago $86,800$198,000
Python Rust Go Scala Java RESTful APIs CI/CD GitLab CI Jenkins Agentic AI solutions Linux Docker AWS LocalStack ESXi Ansible Kubernetes SIEMs Security+ Linux+

Software Engineer Lead - Core AI

PNC

Dallas Innovation Center - Luna Rd (Tx270), US 68 days ago
Python Java AWS CI/CD OCP NLP Conversational AI GenAI Docker Kubernetes

Senior Software Engineer, AI Inference Systems

Nvidia

Us, Ca, Santa Clara, US 32 days ago $184,000$287,500
Python C/C++ CUDA Kubernetes Docker Triton PyTorch vLLM SGLang MLIR Linux Go Rust CI/CD AWS GCP Azure Prometheus Grafana GitHub MLOps

Principal Software Development Engineer (AI/ML)

Abbott

US 44 days ago $130,700$261,300
AWS Python TensorFlow LangChain Hugging Face CI/CD MLOps DevOps Docker Kubernetes PostgreSQL NoSQL Mobile App Development Relational Databases AI ML Cloud Platforms DevSecOps