Senior Software Engineer, CUDA Deep Learning Systems

Nvidia

Remote Actively hiring
Remote · Santa Clara, CA · Austin, TX Posted 15 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

Join our dynamic, research-oriented team as a Senior Software Engineer specializing in CUDA and Deep Learning Systems to work on pioneering initiatives at the intersection of advanced deep learning architectures and distributed computing. You will explore novel system optimizations for high-level DL frameworks and low-level CUDA through modeling, simulation, and silicon prototyping, while also designing and optimizing custom high-performance CUDA kernels tailored to emerging neural network architectures. Your day-to-day involves analyzing complex hardware-software interactions, collaborating with AI researchers and architects, and developing exploratory tools to profile and accelerate new paradigms in deep learning. The role requires strong proficiency in C++ and Python, a solid background in Deep Learning fundamentals, and experience with CUDA programming and distributed computing principles. Ideal candidates have expertise in major DL frameworks like PyTorch and JAX, hands-on experience with communication libraries such as NCCL, and knowledge of numerical methods for low-precision arithmetic.

Skills

CUDA Python C++ PyTorch JAX TensorRT vLLM Nemo Megatron MaxText Triton XLA NCCL MPI UCX Docker CI/CD Git GitHub Linux PostgreSQL Prometheus Grafana

What you'll do

  • Explore and prototype novel optimizations for advanced deep learning models using CUDA.
  • Design and optimize distributed computing systems for seamless scaling from single nodes to supercomputers.
  • Develop custom high-performance CUDA kernels for emerging neural network architectures.
  • Analyze hardware-software interactions to resolve performance bottlenecks in AI workloads.
  • Collaborate with experts to co-design systems that enhance accelerator compute utilization and efficiency.
  • Create tools and runtime systems to profile and accelerate new paradigms in deep learning.
  • Write maintainable code for exploratory prototypes transitioning into open-source or commercial products.

What we're looking for

  • 8+ years of industry experience in software engineering or equivalent academic experience.
  • Strong proficiency in C++ and Python programming languages.
  • Solid understanding of deep learning fundamentals, particularly transformers.
  • Expertise in distributed computing principles and multi-node scaling challenges.
  • Proven experience in systems programming and low-level performance optimization.
  • Deep knowledge of CUDA programming and kernel optimization for GPUs.
  • Familiarity with major deep learning frameworks’ autograd, training, and inference internals.

Market check

Salary context

This $184,000–$287,500 range sits above 81% of similar postings on FindRole.

Peer median band

$162,500$250,100

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$196,750$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Deep Learning Frameworks CUDA Software Engineer

Nvidia

Remote (Us, Ca, Santa Clara, US) 15 days ago $184,000$287,500
CUDA PyTorch JAX TRT-LLM vLLM SGLang Python C++ NCCL MPI UCX Docker CI/CD Prometheus Grafana Git GitHub Linux NVIDIA_Nsight_Systems
Remote

Senior System Software Engineer - CUDA Chips

Nvidia

Us, Ca, Santa Clara, US 64 days ago $152,000$241,500
C CUDA Linux Windows macOS C++ Python Git CI/CD NVIDIA Pre-Silicon Simulation Emulation Kernel_Programming Operating_Systems Virtual_Memory Threads Process_Control Large_Codebases Documentation

Senior GPU Architect, Deep Learning

Nvidia

Us, Ca, Santa Clara, US 140 days ago $184,000$287,500
C C++ Perl Python CUDA TensorFlow PyTorch NVIDIA_GPU_Architecture Deep_Learning Parallel_Computing Computer_Architecture CI/CD MESOS Kubernetes Docker Prometheus Grafana PostgreSQL Redis

Principal System Software Engineer - CUDA Driver

Nvidia

Us, Ca, Santa Clara, US 16 days ago $272,000$431,250
C CUDA HW/SW co-design performance modeling emulation/simulation system level architecture interconnects memory hierarchy interrupts memory-mapped IO driver programming kernel mode development CPU GPU architectures memory coherence consistency models

Senior Deep Learning Tools Engineer – CUDA Tile

Nvidia

Remote (Us, Ca, Santa Clara, US) 23 days ago $152,000$241,500
Python C++ CI/CD PyTorch TensorFlow JAX TensorRT LLVM MLIR CUDA Docker Kubernetes Prometheus Grafana PostgreSQL Git GitHub Linux
Remote