Principal Deep Learning Communication Architect

Nvidia

Remote Actively hiring
Remote, USA · Santa Clara, CA · Austin, TX Posted 46 days ago $272,000$431,250 / year

At a glance

AI generated

TL;DR

As a senior architect on NVIDIA’s cutting-edge AI infrastructure team, you will define the technical roadmap for communication libraries across next-generation platforms, ensuring seamless scaling of models to clusters with hundreds of thousands of nodes. Your responsibilities include leading the development of advanced communication primitives and collective algorithms optimized for heterogeneous interconnects like NVLink, Spectrum-X, and Quantum-X, while also partnering with application developers to co-design specialized communication solutions. You will collaborate closely with silicon architects to influence hardware specifications that meet the demands of trillion-parameter large language models (LLMs) and agentic AI systems. Essential skills include deep expertise in parallelism strategies such as 3D parallelism and ZeRO variants, proficiency in NCCL, UCX, UCC, NVSHMEM, MPI, RDMA, RoCE, and InfiniBand verbs, and hands-on experience with high-throughput inference engines like TensorRT-LLM. Additionally, a strong background in GPU architecture, CUDA programming, and contributions to open-source projects is required.

Skills

NCCL UCX UCC NVSHMEM MPI RDMA RoCE InfiniBand TensorRT-LLM vLLM SGLang NVIDIA Dynamo CUDA Megatron-Core DeepSpeed JAX XLA PyTorch Distributed KServe Ray

What you'll do

  • Define long-term technical roadmap for communication libraries across NVIDIA’s next-gen platforms.
  • Lead development of next-gen communication primitives and collective algorithms for heterogeneous interconnects.
  • Partner with application developers to architect specialized communication primitives for AI and HPC libraries.
  • Influence hardware specifications for next-generation networking to meet demands of trillion-parameter LLMs.
  • Develop high-fidelity analytical models and simulators to predict system behavior under emerging workloads.

What we're looking for

  • Ph.D. or M.S. in Computer Science, Electrical Engineering, or related field with 12+ years of industry experience.
  • Deep expertise in parallelism strategies including 3D parallelism and advanced optimization techniques like ZeRO variants.
  • Proficiency in communication libraries such as NCCL, UCX, UCC, NVSHMEM, and MPI, along with RDMA and InfiniBand verbs.
  • Expertise in high-throughput inference engines like TensorRT-LLM and vLLM for efficient model serving.
  • Extensive knowledge of NVIDIA GPU memory hierarchy and CUDA programming models.
  • Significant contributions to major open-source projects related to distributed computing frameworks.

Market check

Salary context

This $272,000–$431,250 range sits above 99% of similar postings on FindRole.

Peer median band

$177,797$262,400

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$189,925$246,150

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Deep Learning Communication Architect

Nvidia

Us, Ca, Santa Clara, US 10 days ago $184,000$287,500
PyTorch TensorRT-LLM vLLM SGLang C++ Python CUDA OpenCL InfiniBand RoCE MPI NCCL UCX UCC NVSHMEM Data Parallelism Pipeline Parallelism Tensor Parallelism Expert Parallelism FSDP Disaggregated Serving Dynamo Triton

Senior Deep Learning Performance Architect

Nvidia

Us, Ca, Santa Clara, US 25 days ago $184,000$287,500
Python C++ GPU ASIC Deep Learning LLM Batching KV-cache Latency/Tuning Multi-node Scaling Memory Hierarchy Scalability System Architecture Performance Tuning Profiling Debugging

Senior Deep Learning Framework Communications Engineer

Nvidia

Remote (Us, Ca, Santa Clara, US) 12 days ago $152,000$241,500
PyTorch C++ CUDA Python NCCL NVSHMEM JAX TRT-LLM vLLM SGLang HPC AI MPI TensorRT NVIDIA_Nsight_Systems Performance_Profiling Parallel_Programming Compiler_Technologies Memory_Hierarchy Tensor_Layout Distributed_Inference Mixture_of_Experts Reinforcement_Learning
Remote

Principal Architect, AI Networking

Nvidia

Remote (Us, Ca, Santa Clara, US) 37 days ago $272,000$431,250
C C++ Rust Python CUDA InfiniBand RoCE RDMA NVLink NIXL NCCL UCX MPI NVSHMEM vLLM SGLang TensorRT-LLM ML systems concepts High-performance networking
Remote

Senior Software Architect - Deep Learning and HPC Communications

Nvidia

Remote (Us, Ca, Santa Clara, US) 21 days ago $184,000$287,500
C/C++ MPI NCCL NVSHMEM UCX CUDA Linux InfiniBand RoCE NVLink PyTorch TensorFlow HPC Networking Simulation Quantitative_Modeling SHMEM Parallel_Programming Deep_Learning_Pods
Remote

Distinguished Software Architect - Deep Learning and HPC Communications

Nvidia

Us, Ca, Santa Clara, US 14 days ago $320,000$488,750
HPC MPI NCCL NVSHMEM UCX CUDA Infiniband Ethernet C C++ PyTorch TensorFlow GPU Networking System_Architecture Parallel_Programming_Models ML_DL_Fundamentals Performance_Optimization Fault_Tolerance Competitive_Assessments