Senior System Software Engineer, NCCL - Partner Enablement

Nvidia

Remote Actively hiring
Remote, US · Santa Clara, CA · Austin, TX Posted 19 days ago $152,000$241,500 / year

At a glance

AI generated

TL;DR

Join NVIDIA’s GPU Communications Libraries and Networking team as a Partner Enablement Engineer to support key partners and customers using NCCL in deep learning and high-performance computing applications. You’ll engage with clients to diagnose functional and performance issues, conduct performance analysis on groundbreaking GPU clusters, develop tools for issue isolation across various platforms including cloud environments like Azure, AWS, and GCP, guide teams on HPC methodologies, document processes, and deliver training sessions. Ideal candidates have a B.S./M.S. in CS/CE or equivalent experience with 5+ years in parallel programming and communication runtimes such as MPI, NCCL, UCX, NVSHMEM. Strong C/C++ skills, Linux fundamentals, scripting (preferably Python), and expertise in high-performance networking are essential, along with familiarity with CUDA, GPUs, and deep learning frameworks like PyTorch and TensorFlow.

Skills

C/C++ Python Linux Docker Kubernetes SLURM Ansible MPI NCCL UCX NVSHMEM Infiniband RoCE Ethernet RDMA CUDA PyTorch TensorFlow HPC CI/CD

What you'll do

  • Engage with partners to resolve functional and performance issues with NCCL.
  • Conduct performance analysis of NCCL on GPU clusters with high-speed networking.
  • Develop tools to isolate issues on new systems and cloud platforms.
  • Guide customers on HPC methodologies for multi-node cluster applications.
  • Document and conduct trainings/webinars for NCCL users and support teams.
  • Collaborate with internal teams across different time zones on infrastructure.

What we're looking for

  • B.S./M.S. in CS/CE or equivalent with 5+ years of relevant HPC/AI experience.
  • Expertise in C/C++ programming and debugging complex systems.
  • Experience with parallel programming and communication runtimes like MPI, NCCL.
  • Proficiency in Linux fundamentals, scripting languages (Python), and container technologies.
  • Knowledge of high-performance networking including Infiniband/RoCE/Ethernet.
  • Familiarity with cloud provisioning tools such as Docker, Kubernetes, and SLURM.
  • Ability to conduct performance benchmarking and develop infrastructure on HPC clusters.

Market check

Salary context

This $152,000–$241,500 range sits above 70% of similar postings on FindRole.

Peer median band

$131,468$227,125

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$142,400$217,725

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Software Engineer, NCCL

Nvidia

Us, Ca, Santa Clara, US 15 days ago $152,000$241,500
C C++ Linux CUDA InfiniBand iWARP MPI OpenSHMEM NCCL UCX PyTorch TensorFlow HPC GPU

Senior System Software Engineer, Holoscan

Nvidia

Remote (Us, Ca, Santa Clara, US) 25 days ago $184,000$287,500
C/C++ Python Docker Bash CMake AI/ML LLM-based automation Cross-compilation Embedded systems Linux internals Security principles Vulnerability management Patch processes Yocto-based distributions Custom embedded Linux environments Medical AI applications Real-time sensor processing pipelines CI/CD
Remote

Senior Systems Software Engineer

Oracle

US 13 days ago $79,200$178,100
Python Java JavaScript HTML Oracle Cloud Infrastructure (OCI) LLMs prompt engineering model evaluation Oracle AI Data Platform CI/CD