Principal Architect, AI Networking

Nvidia

Remote

Quick summary

Work type
Remote
Location
Santa Clara, CA · Austin, TX
Salary
$272,000–$431,250 / yr
Posted
46 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $214k
This role $352k
$151k most similar roles pay here $461k

This role pays more than 99% of similar roles. Most pay $181,593–$246,150 — the shaded band above. At the midpoint, this role pays about $352k versus about $214k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 985 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 971 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Principal Architect, AI Networking

As a Principal Architect in NVIDIA’s Networking Systems & Software Architecture group, you will lead the research agenda for distributed AI communication systems, focusing on optimizing data movement across GPUs, DPUs, NICs, and storage. Your day-to-day responsibilities include setting long-term technical vision, conducting original research on next-generation networking solutions, driving hardware-software co-optimization, integrating networking capabilities into AI serving stacks, publishing findings, and mentoring senior engineers. The role requires expertise in high-performance networking technologies like InfiniBand, RoCE, RDMA, NVLink, and communication libraries such as NIXL, NCCL, UCX, MPI, and NVSHMEM, along with proficiency in C, C++, Rust, Python, and CUDA programming. You must have a deep understanding of computer architecture, memory hierarchies, DMA engines, OS-level networking, and ML systems concepts to tackle the complex challenges in AI infrastructure at scale.

What you'll do

  • Define long-term technical vision for distributed AI communication systems across GPUs and storage.
  • Conduct original research on next-generation networking solutions using RDMA, NVLink, and GPUDirect.
  • Drive hardware-software co-optimization to address bottlenecks in large-scale AI workloads.
  • Integrate networking capabilities into AI serving stacks like vLLM and TensorRT-LLM.
  • Publish findings and represent NVIDIA in industry forums and standards bodies.

What we're looking for

  • 15+ years of experience in systems software and high-performance networking with track record of delivering complex initiatives from concept to production.
  • Deep expertise in InfiniBand, RoCE, RDMA, NVLink, communication libraries (NIXL, NCCL, UCX), and GPU-accelerated systems.
  • MS/PhD or equivalent experience in Computer Science, Engineering, with focus on computer architecture and OS-level networking.
  • Understanding of ML systems concepts including transformer architectures, KV cache mechanics, model parallelism, and distributed training patterns.
  • Proficiency in C/C++, Rust, Python, CUDA programming, and NVIDIA GPU architecture.
  • Experience integrating networking capabilities into AI serving stacks like vLLM, SGLang, TensorRT-LLM.

More like this

Similar roles

Senior Software Architect, AI Systems and Networking

Nvidia

Remote (Santa Clara, CA) 20 days ago $224,000$356,500
C C++ Rust RDMA GPUDirect NVLink InfiniBand RoCE GPU DPU NIC switch vLLM SGLang TensorRT-LLM NVMe-oF GPUDirect Storage S3 Reinforcement Learning ML inference frameworks
Remote

AI Enablement Architect Lead

Electronic Arts

Austin, TX 4 days ago $141,400$204,400
Python AWS Kubernetes Terraform CI/CD Prompt and context engineering AI agents Agentic architectures Tool calling Model Context Protocol MCP RAG Vector databases Model tuning Evaluation Benchmarking Guardrails Content creation Data analytics C# JavaScript HTML CSS
Hybrid

Principal Architect, Express AI Foundations

Adobe

San Jose 75 days ago $261,800$379,100
Python Java Go Kafka Spark Flink LLM orchestration frameworks Distributed systems Cloud-native deployment MLOps pipelines Feature stores Model registries Agentic AI patterns Caching strategies Database development Performance optimization

Principal Architect, Solution Engineering, AI and Architecture

CVS Health

Remote (Chicago) 13 days ago $144,200$288,400
AWS Azure GCP Domain-Driven Design microservices APIs event-driven systems data modeling AI Agentic AI ML GenAI DevSecOps CI/CD LangGraph Terraform Kubernetes PostgreSQL Docker Prometheus Grafana
Remote

Principal Engineer, AI Serving Framework Architect (Software)

Samsung Semiconductor

San Jose, CA 4 days ago $219,000$351,000
Python PyTorch C++ vLLM AI Inference System Profiling Kubernetes Docker CI/CD PostgreSQL Prometheus Grafana AWS Azure Google Cloud Platform Samsung SDS Cloud Services Git Jenkins GitHub Bitbucket Slack Zoom Confluence Jira Terraform Ansible Kafka Redis MongoDB RAG Vector DB KVCache Hierarchical Memory Systems

AI Strategy, Emerging Systems & AI Principal Architect

Micron Technology

Boise, ID 40 days ago
Azure AWS GCP PyTorch TensorFlow LangChain Python MLOps CI/CD Microsoft 365 Vector databases NIST AI RMF Data engineering Generative AI Agentic systems Multimodal AI RAG Chatbots Predictive analytics AI assistants