Senior Software Engineer, AI Networking

Nvidia

Actively hiring
Santa Clara, CA · Seattle, WA Posted 15 days ago $152,000$241,500 / year

At a glance

AI generated

TL;DR

NVIDIA is seeking a senior software engineer to join its AI Networking co-design and benchmark R&D team, where the candidate will design and implement machine learning tools for optimizing large-scale deep learning training and inference across GPU and CPU clusters. This involves developing resource allocation techniques using reinforcement learning and other optimization methods, building scalable data curation pipelines, and collaborating with hardware teams to deliver performance analysis insights. The role requires expertise in PyTorch or TensorFlow, proficiency in Python, Bash, and C++, and a deep understanding of NVIDIA GPUs, CUDA, and networking concepts like NCCL and RDMA protocols. Ideal candidates have 4+ years of experience applying ML techniques to computer architecture optimization problems at the intersection of HPC, networking, and AI applications.

Skills

Python PyTorch TensorFlow JAX CUDA NCCL Reinforcement_Learning Bayesian_Optimization GNNs Docker Kubernetes CI/CD Prometheus Grafana Bash C++ PostgreSQL Redis

What you'll do

  • Design and implement resource allocation and combinatorial optimization techniques for LLM models at datacenter scale.
  • Research and develop AI/ML techniques to optimize large-scale Deep Learning training and inference on NVIDIA supercomputers.
  • Build and productionize ML-based tools for performance prediction and optimization with a focus on networking aspects.
  • Develop scalable, reliable data curation pipelines to support the training of high-performance Machine Learning models.
  • Lead performance test planning and establish performance targets for new technologies in distributed systems.
  • Collaborate across hardware and software teams to deliver valuable performance analysis insights.

What we're looking for

  • 4+ years of experience applying ML to computer architecture and system optimization problems.
  • Hands-on experience with reinforcement learning, supervised learning, and other ML algorithms for optimization challenges.
  • Proficiency in PyTorch or TensorFlow for building and deploying ML models.
  • Expertise combining knowledge of NVIDIA GPUs, CUDA, deep learning frameworks, and networking concepts.
  • Strong programming skills in Python, Bash, and C++.
  • Proven ability to apply GNNs/transformers-based optimization to PyTorch model graphs and execution traces.

Market check

Salary context

This $152,000–$241,500 range sits above 54% of similar postings on FindRole.

Peer median band

$150,000$236,675

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$158,375$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Software Engineer, AI Networking

Nvidia

Us, Tx, Austin, US 68 days ago $184,000$287,500
C C++ RDMA verbs DPDK DOCA NCCL CUDA InfiniBand RoCE Docker Kubernetes AWS CI/CD Prometheus Grafana Python PostgreSQL

Senior Software Manager, AI Networking

Nvidia

Remote (Us, Ca, Santa Clara, US) 14 days ago $272,000$431,250
BlueField ConnectX Spectrum-X DOCA RDMA RoCE InfiniBand DPDK NCCL CUDA-aware networking congestion control telemetry CI/CD Kubernetes AWS GCP Azure Python Shell scripting Prometheus Grafana
Remote

Senior Software Engineer (AI Platform)

Smartly

US 42 days ago
Python TypeScript PostgreSQL Node.js Docker Kubernetes React AWS GCP CI/CD MLOps PyTorch TensorFlow MLflow Kubeflow

Senior Software Architect, AI Systems and Networking

Nvidia

Remote (Us, Ca, Santa Clara, US) 10 days ago $224,000$356,500
C C++ Rust RDMA GPUDirect NVLink InfiniBand RoCE GPU DPU NIC switch vLLM SGLang TensorRT-LLM NVMe-oF GPUDirect Storage S3 Reinforcement Learning ML inference frameworks
Remote

AI Software Engineer, Senior

Booz Allen Hamilton

Locations Laurel, Maryland, US 42 days ago $86,800$198,000
Python Java C++ JavaScript TypeScript LLM-powered developer tools CI/CD DevOps VS Code Kubernetes Docker GitHub GitLab Jenkins Agentic AI frameworks Orchestration systems Cloud services PostgreSQL MongoDB

AI Software Engineer, Senior

Booz Allen Hamilton

US 42 days ago $86,800$198,000
Python Rust Go Scala Java GitLab CI Jenkins Git Linux Docker Podman AWS LocalStack ESXi Ansible Kubernetes SIEM Security+ Linux+