Agent RL Infra Engineer

Nvidia

Actively hiring
Us, Ca, Santa Clara, US Posted 60 days ago $224,000$356,500 / year

At a glance

AI generated

TL;DR

Join NVIDIA’s agent team as an experienced engineer to revolutionize reinforcement learning (RL) across the enterprise. This senior role bridges ML research and production engineering by evaluating cutting-edge RL approaches and adapting them into user-friendly blueprints for internal developers. Your daily tasks include designing verifiable reward environments, operationalizing training backends within sandboxed execution environments, and integrating with NeMo microservices to enable seamless data workflows. You will also lead data curation strategies, design robust RL training loops, and ensure security compliance while collaborating closely with various teams. Essential skills include a master’s degree in CS or ML, over 10 years of relevant experience, proficiency in Python, Go, Rust, and familiarity with distributed training frameworks like Megatron and NeMo. Experience with NVIDIA infrastructure and the evolving RL-for-agents ecosystem is highly valued.

Skills

Python Go Rust Megatron NeMo DeepSpeed FSDP HF Accelerate Docker Kubernetes Terraform CI/CD Prometheus Grafana GitLab GitHub NVIDIA DGX AI Factory NVLink InfiniBand NeMo Microservices rLLM Agent Lightning HUD OpenRLHF SkyRL

What you'll do

  • Evaluate and adapt emerging RL approaches into enterprise-ready blueprints.
  • Design verifiable reward environments using NeMo Gym for internal use cases.
  • Operationalize NVIDIA training backends as production services within Sandbox.
  • Integrate with NeMo Microservices to enable end-to-end data flywheel workflows.
  • Lead data curation strategies to continuously improve the quality of training data.

What we're looking for

  • MS in CS, ML, or related field (or equivalent experience) and 10+ years of relevant industry experience
  • Experience operationalizing RL techniques like DPO, GRPO, PPO, RLAIF into reusable workflows
  • Proficiency with Python, Go, Rust, and familiarity with distributed training frameworks such as Megatron, NeMo, DeepSpeed
  • Strong background in ML ops including pipeline automation, job orchestration, and GPU cluster management
  • Experience building RL environments or training recipes for self-service consumption by other teams
  • Familiarity with NVIDIA infrastructure (DGX, AI Factory) and NeMo Microservices preferred

Market check

Salary context

This $224,000–$356,500 range sits above 96% of similar postings on FindRole.

Peer median band

$112,500$198,000

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$126,800$192,556

Middle half of comparable postings.

Based on 239 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Critical Environments SR.Chief Engineer

JLL (Jones Lang LaSalle)

Remote (Usa-Client Durham Nc-Dell - Durham 4121, US) 9 days ago
CMMS BMS EMS Python SQL Excel Project Management Leadership Communication Problem Solving Decision Making Budget Management Automation Innovation Safety Compliance Energy Efficiency Sustainability Vendor Management Training Development Hiring Evaluation
Remote

Infra Services Analyst Senior Advisor

Elevance Health

Oh-Mason, 4361 Irwin Simpson Rd, US 27 days ago
ServiceNow Postman SOAP API Microservices Apigee Docker DataPower WebSphere MQ Message Broker HTTPS Kafka Splunk Java J2EE AngularJS Siteminder F5 Load Balancer OAuth SSL IP routing firewall technologies VMware ESX Hosts Oracle SQL MongoDB DB2 Redis Cache AWS Google Cloud EKS Lambda S3

Solutions Engineer - Agents & Automation

SHI International

Remote (Us - Tx - Home Office, US) 8 days ago $150,000$207,000
Microsoft 365 Power Platform Azure Copilot Studio Microsoft Graph Power Apps Power Automate YAML Microsoft Entra Microsoft Purview Microsoft Defender DLP policies Managed environments Environment rules ALM pipelines Agent 365 CI/CD Python JavaScript JSON REST APIs OAuth 2.0 SAML 2.0
Remote

Resiliency Automation Engineer

Carnegie Mellon University

Locations Pittsburgh, Pennsylvania, US 132 days ago
CI/CD Docker Kubernetes SonarQube CppCheck Clang-Tidy C C++ Java Linux RTOS DevOps Python Git Jenkins Mentoring Observability Logging Monitoring

Forward Deployed Engineer, Professional Services

Stripe

US 83 days ago $173,000$259,600
JavaScript TypeScript React Ruby Go Java Docker Kubernetes CI/CD PostgreSQL AWS GCP Azure Git Swagger GraphQL Python Terraform Ansible Jenkins

Senior Engineer - Asset Protection (Hybrid - Seattle)

Nordstrom

Seattle Wa, US 13 days ago $142,000$220,500
Java Spring Boot AWS GCP Docker Kubernetes Kafka Confluent SQS GitLab CI/CD Terraform New Relic Splunk Grafana Postman curl OpenAPI/Swagger unit testing integration testing regression testing load testing relational databases NoSQL data stores