Senior Deep Learning Frameworks CUDA Software Engineer

Nvidia

Remote Actively hiring
Remote · Santa Clara, CA · Austin, TX Posted 15 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

NVIDIA seeks a senior Deep Learning Engineer to integrate advanced CUDA features into AI frameworks like PyTorch, TRT-LLM, vLLM, and JAX, working closely with teams that develop core CUDA technologies. This role involves deep analysis of AI workloads, driving improvements in the AI Compiler-Runtime interface, designing fault-tolerant solutions for large-scale workloads, and collaborating across multiple time zones to enhance performance and programmability. Ideal candidates have extensive experience with Deep Learning Frameworks, proficiency in Python, C++, CUDA, and a solid understanding of HPC/AI communication concepts. Expertise in deep learning compilers, distributed machine learning techniques, and hands-on experience with CUDA and specific communication libraries is required for this role that aims to advance the state of AI technology at scale.

Skills

CUDA PyTorch JAX TRT-LLM vLLM SGLang Python C++ NCCL MPI UCX Docker CI/CD Prometheus Grafana Git GitHub Linux NVIDIA_Nsight_Systems

What you'll do

  • Integrate new CUDA features and Runtime abstractions in AI frameworks from proof-of-concept to production.
  • Analyze AI workloads and frameworks to identify opportunities for innovation in the stack's lower layers.
  • Drive improvements in the AI Compiler-Runtime interface to build high-speed multi-GPU solutions.
  • Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads.
  • Develop exploratory tools and runtime systems to profile and accelerate new deep learning paradigms.

What we're looking for

  • BS, MS, or PhD degree in Computer Science, Engineering, or related field.
  • 8+ years of industry experience developing with Deep Learning Frameworks like PyTorch and JAX.
  • Proficient in rapid prototyping using Python, C++, CUDA, or similar DSLs.
  • Expertise in performance internals and execution graphs of major deep learning frameworks.
  • Experience conducting benchmarking on AI clusters and familiarity with profiling tools.

Market check

Salary context

This $184,000–$287,500 range sits above 80% of similar postings on FindRole.

Peer median band

$161,800$250,100

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$185,906$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Software Engineer, CUDA Deep Learning Systems

Nvidia

Remote (Us, Ca, Santa Clara, US) 15 days ago $184,000$287,500
CUDA Python C++ PyTorch JAX TensorRT vLLM Nemo Megatron MaxText Triton XLA NCCL MPI UCX Docker CI/CD Git GitHub Linux PostgreSQL Prometheus Grafana
Remote

Senior Deep Learning Tools Engineer – CUDA Tile

Nvidia

Remote (Us, Ca, Santa Clara, US) 23 days ago $152,000$241,500
Python C++ CI/CD PyTorch TensorFlow JAX TensorRT LLVM MLIR CUDA Docker Kubernetes Prometheus Grafana PostgreSQL Git GitHub Linux
Remote

Senior GPU Architect, Deep Learning

Nvidia

Us, Ca, Santa Clara, US 140 days ago $184,000$287,500
C C++ Perl Python CUDA TensorFlow PyTorch NVIDIA_GPU_Architecture Deep_Learning Parallel_Computing Computer_Architecture CI/CD MESOS Kubernetes Docker Prometheus Grafana PostgreSQL Redis

Senior System Software Engineer - CUDA Chips

Nvidia

Us, Ca, Santa Clara, US 64 days ago $152,000$241,500
C CUDA Linux Windows macOS C++ Python Git CI/CD NVIDIA Pre-Silicon Simulation Emulation Kernel_Programming Operating_Systems Virtual_Memory Threads Process_Control Large_Codebases Documentation