Software Engineer 5 – Model Runtime, AI Platform

Netflix

Remote

Quick summary

Work type: Remote
Location: Remote
Salary: $466,000–$750,000 / yr
Posted: 46 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $179k

This role $608k

$68k most similar roles pay here $823k

This role pays more than 99% of similar roles. Most pay $142,400–$214,850 — the shaded band above. At the midpoint, this role pays about $608k versus about $179k for comparable roles.

Based on 240 similar postings.

Employer

About Netflix

Netflix is the world''s leading streaming entertainment service, offering a vast library of TV series, films, documentaries, and original content to subscribers in over 190 countries. Industry: Streaming Entertainment & Media

Netflix currently has 117 open roles on FindRole.

Listed pay typically runs $388,000–$619,000 across 113 roles with salary data.

Most-posted roles

View all roles at Netflix

At a glance

TL;DR · Software Engineer 5 – Model Runtime, AI Platform

Apply Now Log in to save

As a Software Engineer at Netflix's Model Runtime team, you will work on the cutting edge of machine learning infrastructure, designing systems for reinforcement learning, reward modeling, and preference optimization. You will enable next-generation GenAI workloads by creating scalable distributed training frameworks and optimizing GPU pipelines for real-time inference. Your responsibilities include scaling fault-tolerant training across hundreds of GPUs using FSDP and mixed-precision strategies, as well as profiling PyTorch operators to enhance GPU utilization. The role requires expertise in ML systems engineering, hands-on experience with PyTorch internals, and proficiency in cloud computing, particularly AWS. Ideal candidates have a background in distributed training at scale, inference optimization techniques like quantization, and GPU performance tuning using CUDA and Nsight. This position offers the opportunity to tackle complex challenges in AI infrastructure that directly impact Netflix's global streaming service.

Skills

PyTorch DistributedTraining FSDP AWS GPU CUDA NCCL TensorRT Quantization KV-cache MultimodalModels DiffusionModels LLM SFT RLHF GRPO DPO CloudComputing CI/CD

What you'll do

Build alignment and post-training infrastructure for reinforcement learning models.
Enable next-generation GenAI workloads including distributed training and serving.
Scale distributed training systems using FSDP across hundreds of GPUs.
Optimize full stack from PyTorch operators to GPU kernels for efficiency.
Evaluate emerging hardware and frameworks to keep Netflix at the efficiency frontier.

What we're looking for

Experience in ML systems engineering for large-scale training and inference.
Strong skills in systems programming across multiple stack layers, including PyTorch internals.
Hands-on experience with distributed training and system-model codesign at scale.
Comfort with ambiguity and ability to work across business and technical domains.
Expertise in cloud computing providers, preferably AWS.
Excellent written and verbal communication skills for remote environments.

Similar roles

Software Engineer 5 – Training Platform, AI Platform

Netflix

Remote (Usa - Remote, US) 7 days ago $466,000–$750,000

Kubernetes Ray PyTorch AWS SageMaker Databricks OpenAI Python GPU NVIDIA Nsight Systems FSDP TensorParallel PipelineParallel CI/CD Prometheus Grafana

Remote

Save

Software Engineer 5 – Model Serving Systems, AI Platform

Netflix

Remote (Usa - Remote, US) 4 days ago $466,000–$750,000

AWS Triton Inference Server TensorRT Docker Java Python Kubernetes CI/CD LLMs Model Serving Infrastructure High Availability Performance Tuning Deployment Management Capacity Planning Observability Logging

Remote

Save

Careers

Qualcomm

San Diego, CA 46 days ago

Python C++ C TensorFlow PyTorch ONNX GPU NPU CPU Computer_Vision Audio Generative_AI Linux Windows CI/CD

Save

Software Engineer 4/5– AI for Member Systems

Netflix

Remote (Usa - Remote, US) 140 days ago $466,000–$750,000

Python Scala Java C++ Spark Flink TensorFlow PyTorch JAX Keras AWS CI/CD

Remote

Save

AI Software Engineer

Broadcom

Atlanta, GA +2 51 days ago $108,000–$172,800

Java Spring GitHub Git GitHubActions CI/CD Micrometer OpenTelemetry LargeLanguageModels LLMs VectorDatabases Langchain4J Embable Anthropic OpenAI AmazonBedrock GoogleGenAI AzureOpenAI TanzuPlatform10 Bitnami SpringAI

Save

AI Software Engineer

Booz Allen Hamilton

Arlington, VA 72 days ago $86,800–$198,000

Python Rust Go Scala Java RESTful APIs CI/CD GitLab CI Jenkins Agentic AI solutions Linux Docker AWS LocalStack ESXi Ansible Kubernetes SIEMs Security+ Linux+

Save