Senior Deep Learning Tools Engineer – CUDA Tile

Nvidia

Remote Actively hiring
Remote, US · Santa Clara, CA · Salt Lake City, UT · Redmond, WA Posted 23 days ago $152,000$241,500 / year

At a glance

AI generated

TL;DR

As a Deep Learning Compiler & Tools Engineer at NVIDIA, you will join our advanced compiler technologies team to drive performance improvements for next-generation compilers used in AI workloads. Your day-to-day responsibilities include designing performance testing frameworks, building automated CI/CD pipelines, implementing benchmarking systems, and analyzing performance data to identify optimization opportunities. You will collaborate with various teams to ensure reliable and reproducible performance metrics across diverse GPU environments. The ideal candidate has a strong background in software engineering, particularly in performance analysis and system optimization, with expertise in Python (C++ preferred) and experience with deep learning frameworks like PyTorch or TensorFlow. Familiarity with CI/CD systems, hardware-aware performance analysis, and compiler internals is essential for this role, which plays a crucial part in accelerating AI workloads on NVIDIA's cutting-edge GPU technology.

Skills

Python C++ CI/CD PyTorch TensorFlow JAX TensorRT LLVM MLIR CUDA Docker Kubernetes Prometheus Grafana PostgreSQL Git GitHub Linux

What you'll do

  • Design and develop performance testing frameworks for deep learning compilers and workloads.
  • Build and maintain automated pipelines to continuously track performance across models, hardware, and compiler changes.
  • Implement benchmarking systems to measure latency, throughput, and efficiency of AI and HPC workloads.
  • Analyze performance trends over time to identify regressions, bottlenecks, and optimization opportunities.
  • Develop tools and dashboards for performance visualization, reporting, and insights.

What we're looking for

  • 5+ years of software engineering experience with focus on performance engineering
  • Strong programming skills in Python; C++ proficiency preferred
  • Experience with CI/CD systems and automation frameworks
  • Familiarity with hardware-aware performance analysis for GPUs or similar systems
  • Background in data analysis, profiling, and regression tracking
  • Understanding of compiler internals (LLVM, MLIR, CUDA compilation flow)
  • Experience building performance dashboards and telemetry systems

Market check

Salary context

This $152,000–$241,500 range sits above 41% of similar postings on FindRole.

Peer median band

$168,000$261,850

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$188,800$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Deep Learning Frameworks CUDA Software Engineer

Nvidia

Remote (Us, Ca, Santa Clara, US) 15 days ago $184,000$287,500
CUDA PyTorch JAX TRT-LLM vLLM SGLang Python C++ NCCL MPI UCX Docker CI/CD Prometheus Grafana Git GitHub Linux NVIDIA_Nsight_Systems
Remote

Senior Deep Learning Compiler Engineer - XLA

Nvidia

Remote (Us, Ca, Santa Clara, US) 93 days ago $152,000$241,500
C/C++ CUDA JAX PyTorch TensorFlow XLA MLIR LLVM OpenAI_Triton GPU distributed_programming performance_analysis compiler_optimizations clean_software_engineering_practices high_performance_computing
Remote

Senior Deep Learning Compiler Engineer

Nvidia

Remote (Us, Ca, Santa Clara, US) 29 days ago $152,000$241,500
MLIR XLA TVM LLVM PyTorch CUDA C++ Python GPU CPU Embedded_Systems Cross_Compilation CI/CD
Remote

Senior Software Engineer, CUDA Deep Learning Systems

Nvidia

Remote (Us, Ca, Santa Clara, US) 15 days ago $184,000$287,500
CUDA Python C++ PyTorch JAX TensorRT vLLM Nemo Megatron MaxText Triton XLA NCCL MPI UCX Docker CI/CD Git GitHub Linux PostgreSQL Prometheus Grafana
Remote

Senior GPU Architect, Deep Learning

Nvidia

Us, Ca, Santa Clara, US 140 days ago $184,000$287,500
C C++ Perl Python CUDA TensorFlow PyTorch NVIDIA_GPU_Architecture Deep_Learning Parallel_Computing Computer_Architecture CI/CD MESOS Kubernetes Docker Prometheus Grafana PostgreSQL Redis