Senior DL Algorithms Engineer - Inference Performance

Nvidia

Remote Actively hiring

Remote, US · Santa Clara, CA Posted 24 days ago $152,000–$241,500 / year

View original post Log in to save

At a glance

AI generated

TL;DR

Join our dynamic team as a Senior DL Algorithms Engineer, focusing on optimizing large language models and omni models for peak performance. You will enable state-of-the-art open-source models like Nemotron and Cosmos on NVIDIA’s accelerated inference software stack, contributing to frameworks such as TRT-LLM and vLLM by profiling bottlenecks and delivering production code. Your role involves benchmarking competitive offerings and collaborating with partner teams to develop next-generation AI services. Ideal candidates have a PhD in CS, EE, or CSEE, 3+ years of experience, and expertise in deep learning inference, performance optimization for GPU-based applications, and proficiency in PyTorch or similar frameworks. Strong skills in computer architecture, GPU programming (CUDA/OpenCL), and algorithm fundamentals are essential for this fast-paced environment at the forefront of AI innovation.

Skills

PyTorch NVIDIA_TRT-LLM vLLM SGLang FlashInfer GPU_architecture CUDA OpenCL Deep_Learning Neural_Networks Performance_profiling HPC Computer_Architecture Python C++

What you'll do

Enable and optimize state-of-the-art open models on NVIDIA’s accelerated inference software stack.
Contribute new features, fix bugs, and deliver production code to open-source deep learning frameworks.
Profile and analyze bottlenecks across the full inference stack to enhance performance.
Benchmark state-of-the-art offerings and perform competitive analysis for NVIDIA’s SW/HW stack.
Co-design next-generation AI models and services with partner teams.

What we're looking for

PhD in CS, EE, CSEE or equivalent 3+ years of relevant industry experience.
Expertise in deep learning and neural networks, focusing on inference optimization.
Proficiency in performance profiling, analysis, and optimization for GPU-based applications.
Strong background in computer architecture, including GPU fundamentals.
Experience with processor and system-level performance optimization techniques.
Familiarity with modern LLM/Diffusion architectures and open-source frameworks.

Market check

Salary context

This $152,000–$241,500 range sits above 39% of similar postings on FindRole.

Peer median band

$166,000–$253,000

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$182,125–$238,250

Middle half of comparable postings.

Based on 239 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

Similar roles

Senior DL Algorithms Engineer - Inference Performance

Nvidia

Remote (Us, Ca, Santa Clara, US) 100 days ago $184,000–$287,500

C++ PyTorch CUDA OpenCL Python DeepLearningFrameworks GPUArchitecture PerformanceProfiling AlgorithmDesign CI/CD

Remote

Senior Deep Learning Software Engineer, Inference

Nvidia

Remote (Us, Ca, Santa Clara, US) 24 days ago $184,000–$287,500

C++ Python CUDA NCCL NVSHMEM OAI_TRITON CUTLASS PyTorch vLLM SGLang FlashInfer Multi-GPU_Communications Deep_Learning_Frameworks Performance_Optimization GPU_Acceleration

Remote

Senior Software Engineer - AI Inference

Nvidia

Remote (Us, Ca, Santa Clara, US) 45 days ago $152,000–$241,500

Python C++ CUDA vLLM SGLang PyTorch Triton NCCL Dynamo CI/CD GPU InfiniBand Profiling Flamegraphs Microbenchmarks Concurrency Multi-threading Multi-process Kubernetes Docker PostgreSQL

Remote

Senior Software Engineer, Machine Learning Inference

Nvidia

Us, Ca, Santa Clara, US 49 days ago $152,000–$241,500

C++ Python CUDA Rust TensorRT TensorRT-LLM vLLM SGLang PyTorch JAX Deep Learning Frameworks GPU Programming Performance Analysis Optimization Techniques CI/CD

Senior ML Infrastructure Engineer, Inference Platform

General Motors (GM)

Remote (Gm Automation - Sunnyvale - Gm Automation - Sunnyvale, US) 77 days ago $155,420–$205,900

Python Triton RayServe vLLM C++ Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Redis AWS Azure Google Cloud Platform Git Jenkins GitHub Slack Confluence Jira

Remote

Senior Compiler Engineer, AI Inference Performance

Nvidia

Remote (Us, Ca, Santa Clara, US) 94 days ago $152,000–$241,500

MLIR LLVM XLA Triton PyTorch JAX Nsight Compute CUDA C++ Python CI/CD

Remote