Software Engineer 5 – Training Platform, AI Platform

Netflix

Remote

Quick summary

Work type
Remote
Location
Remote
Salary
$466,000–$750,000 / yr
Posted
7 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $176k
This role $608k
$68k most similar roles pay here $823k

This role pays more than 99% of similar roles. Most pay $142,350–$209,750 — the shaded band above. At the midpoint, this role pays about $608k versus about $176k for comparable roles.

Based on 240 similar postings.

Employer

About Netflix

Netflix is the world''s leading streaming entertainment service, offering a vast library of TV series, films, documentaries, and original content to subscribers in over 190 countries. Industry: Streaming Entertainment & Media

Netflix currently has 117 open roles on FindRole.

Listed pay typically runs $388,000–$619,000 across 113 roles with salary data.

Most-posted roles

View all roles at Netflix

At a glance

TL;DR · Software Engineer 5 – Training Platform, AI Platform

As a senior engineer at Netflix, you will join a dynamic team focused on advancing machine learning infrastructure. Your role involves designing and building the platform that powers large-scale model training for diverse use cases across the company, including personalized recommendations and content demand modeling. You will optimize distributed systems to enhance performance and reliability while creating intuitive APIs for both experts and non-experts. The position requires expertise in Kubernetes, Ray clusters, PyTorch, and cloud computing, particularly AWS. Additionally, you must diagnose and improve GPU utilization, memory efficiency, and communication overhead in large-scale training jobs, ensuring the platform supports key product functions efficiently.

What you'll do

  • Design and build platform infrastructure for large-scale machine learning model training.
  • Optimize performance of distributed training jobs on GPU clusters.
  • Co-design systems to increase cost-effectiveness of ML model training at scale.
  • Develop intuitive APIs and interfaces for both expert and non-expert users.
  • Diagnose and improve memory efficiency, communication overhead in training workflows.
  • Lead technical discussions and align stakeholders on platform direction priorities.

What we're looking for

  • Deep expertise in distributed model training and large-scale system architecture
  • Experience diagnosing and optimizing performance for GPU utilization and memory efficiency
  • Strong background in cloud computing platforms, preferably AWS
  • Ability to work effectively across multiple layers of the tech stack on diverse projects
  • Proficiency in promoting best practices for observability and logging in operations
  • Excellent communication skills for technical discussions and stakeholder alignment

More like this

Similar roles

Software Engineer 5 – Model Runtime, AI Platform

Netflix

Remote (Usa - Remote, US) 46 days ago $466,000$750,000
PyTorch DistributedTraining FSDP AWS GPU CUDA NCCL TensorRT Quantization KV-cache MultimodalModels DiffusionModels LLM SFT RLHF GRPO DPO CloudComputing CI/CD
Remote

Software Engineer 5 – Model Serving Systems, AI Platform

Netflix

Remote (Usa - Remote, US) 4 days ago $466,000$750,000
AWS Triton Inference Server TensorRT Docker Java Python Kubernetes CI/CD LLMs Model Serving Infrastructure High Availability Performance Tuning Deployment Management Capacity Planning Observability Logging
Remote

Sr. Software Engineer - Applied AI

GEICO

Remote (Palo Alto, CA) 57 days ago $80,000$215,000
Python LangChain HuggingFace OpenAI Kubernetes CI/CD Docker Prometheus Grafana PostgreSQL Redis Apache Kafka Spring AI LangGraph LangSmith LlamaIndex Anthropic APIs Vector databases Knowledge graphs Java Spring生态系统
Remote

Software Engineer (AI/GenAI Platforms)

Allstate

Charlotte Railyard +3 77 days ago $85,000$145,075
Python AWS Java LangChain Hugging Face OpenAI Amazon SageMaker MongoDB Atlas Amazon DocumentDB Apache Kafka Datadog AWS CloudWatch CI/CD LLMs RAG Vector Search & Embeddings Multimodal AI Prompt Engineering Semantic Models