Senior Software Engineer, RL Post-Training Frameworks

Nvidia

Remote

Quick summary

Work type: Remote
Location: Santa Clara, CA
Salary: $184,000–$287,500 / yr
Posted: 46 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $182k

This role $236k

$120k most similar roles pay here $305k

This role pays more than 88% of similar roles. Most pay $142,450–$222,000 — the shaded band above. At the midpoint, this role pays about $236k versus about $182k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 985 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 971 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Software Engineer, RL Post-Training Frameworks

Apply Now Log in to save

Join NVIDIA’s RL Frameworks engineering team as a senior engineer to develop the open-source tools and infrastructure that enable AI researchers and post-training teams. You will architect and build scalable reinforcement learning infrastructure from single GPU experiments to large-scale production deployments across thousands of nodes, optimizing performance on GPUs, CPUs, and LPUs while contributing to frameworks like VeRL, Miles, and TorchTitan. Your role includes enhancing distributed runtimes such as Ray and Monarch for fault tolerance and elastic scaling, collaborating with hardware teams to leverage next-generation capabilities, and advocating for the needs of researchers and partners within NVIDIA’s ecosystem. Strong proficiency in Python and C/C++, experience with large-scale distributed systems, and depth in reinforcement learning algorithms or PyTorch internals are essential, along with contributions to open-source projects and hands-on experience with production failures at scale.

Skills

Python C++ PyTorch Kubernetes Ray VeRL Miles TorchTitan FSDP TensorParallelism PipelineParallelism NCCL NVLink InfiniBand MegatronLM vLLM SGLang TensorRT-LLM Monarch DeepSpeedChat OpenRLHF NeMoAligner

What you'll do

Design and implement scalable RL infrastructure for efficient experimentation and production.
Optimize RL training-inference-rollout loops on diverse hardware for performance.
Contribute to and enhance open-source RL frameworks like VeRL and TorchTitan.
Ensure fault tolerance and elastic scaling in distributed training jobs.
Collaborate with teams to integrate CPU-driven rollout workloads efficiently.
Advocate for RL workload requirements with NVIDIA's networking and compiler teams.

What we're looking for

MS or PhD in Computer Science, Engineering, or related field with 5+ years professional experience.
Strong proficiency in Python and C/C++ for building large-scale distributed systems.
Experience contributing to open-source RL frameworks like VeRL, Miles, TorchTitan.
Deep understanding of reinforcement learning algorithms and their distributed execution challenges.
Expertise in Kubernetes runtime internals and end-to-end distributed system design.

Similar roles

Senior Software Engineer, Hardware Tools and Methodology Development

Nvidia

Santa Clara, CA 18 days ago $136,000–$218,500

C++ Perl Python Make Verilog RTL ASIC Design Clocks/Resets_design_and_verification

Hybrid

Save

Senior Software Engineer, Platform

Anduril Industries

Costa Mesa, CA 4 days ago $191,000–$253,000

Go C++ Python Rust AWS Azure CI/CD Terraform NixOS Kubernetes Docker Prometheus Grafana PostgreSQL MongoDB Redis Git GitHub Jenkins

Save

Senior Software Engineer, Platform

Anduril Industries

Seattle, WA 4 days ago $191,000–$253,000

Go C++ Python Rust Java JavaScript TypeScript AWS Azure CI/CD Terraform NixOS Kubernetes Prometheus Grafana PostgreSQL Docker

Save

Senior Software Engineer, Platform

Anduril Industries

Boston, MA 4 days ago $191,000–$253,000

Go C++ Python Rust Java TypeScript AWS Azure CI/CD Terraform NixOS Kubernetes Prometheus Grafana

Save

Senior Software Engineer, Application Security

Anduril Industries

Boston, MA 4 days ago $191,000–$253,000

Go AWS Python CI/CD Terraform Kubernetes OPA/Rego CircleCI GitHub Actions Syft Trivy Grype Semgrep Nix Prometheus Grafana

Save

Senior Software Engineer, Infrastructure

Anduril Industries

Washington, District of Columbia 4 days ago $220,000–$292,000

Python Kubernetes Docker CI/CD Java C++ Rust Go JavaScript AWS PostgreSQL Terraform ML infrastructure Virtualization Containerization

Save