Senior Deep Learning Framework Communications Engineer

Nvidia

Remote Actively hiring Verified listing

Remote, USA · Santa Clara, CA · Westford, MA · Austin, TX · Durham, NC Posted 11 days ago $152,000–$241,500 / year

View original post Log in to save

At a glance

AI generated

TL;DR

Join NVIDIA as a Deep Learning Engineer to enhance AI stacks by integrating advanced communication technologies like PyTorch, TRT-LLM, vLLM, SGLang, JAX, and more. You will work closely with the team behind NCCL, NVSHMEM, and GPUDirect to optimize multi-GPU communications for diverse demands from training on up to 100K GPUs to microsecond latency inference. Your daily tasks include analyzing AI workloads, improving compilers, designing fault-tolerant solutions, and authoring custom kernels. Ideal candidates have a B.S., M.S., or Ph.D. in Computer Science with extensive experience in HPC/AI, proficiency in Python, C++, CUDA, and familiarity with performance profiling tools like PyTorch profiler and NVIDIA Nsight Systems.

Skills

PyTorch C++ CUDA Python NCCL NVSHMEM JAX TRT-LLM vLLM SGLang HPC AI MPI TensorRT NVIDIA_Nsight_Systems Performance_Profiling Parallel_Programming Compiler_Technologies Memory_Hierarchy Tensor_Layout Distributed_Inference Mixture_of_Experts Reinforcement_Learning

What you'll do

Integrate new features in AI frameworks like PyTorch and TRT-LLM.
Analyze multi-GPU communication requirements for AI workloads.
Improve AI compilers to optimize communication performance.
Design fault-tolerant solutions for large-scale AI workloads.
Author custom kernels to demonstrate ultimate performance on NV platforms.

What we're looking for

B.S., M.S., or Ph.D. in Computer Science or related field with 5+ years of software engineering experience in HPC/AI.
Development/integration experience with Deep Learning Frameworks like PyTorch, JAX, TRT-LLM, vLLM, and SGLang.
Proficiency in Python, C++, CUDA, Triton, cuTe for rapid prototyping and development.
Solid understanding of AI models, parallelisms, compiler technologies, and performance benchmarking tools.
Experience with HPC/AI communication concepts including 1-sided/2-sided communication, elasticity, resiliency, and topology discovery.
Expertise in training, distributed inference, MoE, reinforcement learning, or kernel authoring on CUDA, Triton, cuTe.

Market check

Salary context

This $152,000–$241,500 range sits above 40% of similar postings on FindRole.

Peer median band

$163,450–$257,300

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$180,025–$246,150

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

Similar roles

Senior Performance Engineer - Deep Learning

Nvidia

Us, Ca, Santa Clara, US 86 days ago $152,000–$241,500

Python C++ PyTorch JAX CUDA cuBLAS cuDNN cuSOLVER GPU MLPerf OpenAI_Triton Pallas CI/CD

Senior Deep Learning Software Engineer, Inference

Nvidia

Remote (Us, Ca, Santa Clara, US) 24 days ago $184,000–$287,500

C++ Python CUDA NCCL NVSHMEM OAI_TRITON CUTLASS PyTorch vLLM SGLang FlashInfer Multi-GPU_Communications Deep_Learning_Frameworks Performance_Optimization GPU_Acceleration

Remote

Senior Deep Learning Communication Architect

Nvidia

Us, Ca, Santa Clara, US 9 days ago $184,000–$287,500

PyTorch TensorRT-LLM vLLM SGLang C++ Python CUDA OpenCL InfiniBand RoCE MPI NCCL UCX UCC NVSHMEM Data Parallelism Pipeline Parallelism Tensor Parallelism Expert Parallelism FSDP Disaggregated Serving Dynamo Triton

Senior Deep Learning Software Engineer

Nvidia

US 85 days ago $224,000–$356,500

Python PyTorch JAX CUDA TensorRT NVIDIA_TensorRT_LLM GPU_optimization CUTLASS Triton Deep_learning_frameworks Performance_analysis GPU_architecture High_performance_computing Model_inference Inference_optimization

Senior Deep Learning Frameworks Sustaining Engineer

Nvidia

Us, Ca, Santa Clara, US 140 days ago $152,000–$218,500

C/C++ Python TensorFlow PyTorch Docker Bazel Make Gitlab Jenkins GitHub Gitlab CI CUDA OpenCL Makes npm Debian pip Git CI/CD

Senior Deep Learning Engineer - Model Evaluation & AI Systems

Nvidia

Us, Ca, Santa Clara, US 87 days ago $224,000–$356,500

Python PyTorch TensorFlow Kubernetes Docker CI/CD GitHub NVIDIA_Deep_Learning_Containers PostgreSQL MongoDB AWS Google_Cloud_Platform Azure GitLab Jenkins Prometheus Grafana Open_Source_Contributions