Senior Deep Learning Software Engineer, LLM Performance

Nvidia

Hybrid Actively hiring
Santa Clara, CA Posted 46 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

NVIDIA seeks a Senior Deep Learning Software Engineer to join its rapidly expanding research and development team focused on optimizing Large Language Model (LLM) inference performance. This role involves collaborating with the deep learning community to implement cutting-edge algorithms for public release in frameworks like TensorRT LLM, VLLM, SGLang, and LLM benchmarks, while also identifying and executing optimizations across various NVIDIA accelerators. The engineer will work on deploying and serving models using CUDA kernels and Triton, contributing features and code to NVIDIA’s open-source projects and collaborating with diverse teams involved in performance modeling, analysis, and kernel development. Ideal candidates have a strong background in computer engineering or related fields, at least 8 years of software development experience, proficiency in Python/C/C++, and expertise with DL frameworks like PyTorch, JAX, or TensorFlow. Experience with LLM frameworks, GPU programming (CUDA), and performance optimization is essential for this role that drives innovation in AI computing.

Skills

Python C++ CUDA TensorRT Triton PyTorch JAX TensorFlow VLLM SGLang DL compiler Performance modeling Profiling Debugging Code optimization GPU programming Deep learning framework CI/CD

What you'll do

  • Optimize performance of LLM, VLM, and GenAI models for inference.
  • Scale model performance across various NVIDIA accelerator architectures.
  • Develop and contribute features to NVIDIA/OSS LLM frameworks and TensorRT.
  • Implement efficient serving algorithms using TensorRT LLM, VLLM, SGLang.
  • Collaborate on innovative solutions with cross-functional teams in AI fields.

What we're looking for

  • At least 8 years of relevant software development experience.
  • Expertise in Python/C/C++ programming, software design, and engineering.
  • Experience with deep learning frameworks like PyTorch, JAX, TensorFlow.
  • Prior work with LLM frameworks or DL compilers for inference and deployment.
  • Proficiency in performance modeling, profiling, debugging, and code optimization.
  • Knowledge of CPU and GPU architectures, including GPU programming (CUDA/OpenCL).
  • Collaborative experience across diverse teams in generative AI and related fields.

Market check

Salary context

This $184,000–$287,500 range sits above 75% of similar postings on FindRole.

Peer median band

$161,800$257,100

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$184,668$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 825 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 813 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Deep Learning Software Engineer

Nvidia

Santa Clara, CA 38 days ago $224,000$356,500
Python PyTorch JAX CUDA TensorRT NVIDIA_TensorRT_LLM GPU_optimization CUTLASS Triton Deep_learning_frameworks Performance_analysis GPU_architecture High_performance_computing Model_inference Inference_optimization
Hybrid

Senior Deep Learning Software Engineer, Inference

Nvidia

Remote (Santa Clara, CA) 27 days ago $184,000$287,500
C++ Python CUDA NCCL NVSHMEM OAI_TRITON CUTLASS PyTorch vLLM SGLang FlashInfer Multi-GPU_Communications Deep_Learning_Frameworks Performance_Optimization GPU_Acceleration
Remote

Deep Learning Software Engineer, TensorRT Performance - New College Grad 2026

Nvidia

Remote (Santa Clara, CA) 59 days ago $124,000$195,500
C++ Python TensorRT PyTorch CUDA ONNX JAX TensorFlow performance analysis GPU architecture Transformers Recommenders ASR TTS Visual Understanding graph compilers Jetson systems deep learning inference low-latency systems resource-constrained systems
Remote