Software Engineer 5 – Model Serving Systems, AI Platform

Netflix

Remote

Quick summary

Work type
Remote
Location
Remote
Salary
$466,000–$750,000 / yr
Posted
4 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $190k
This role $608k
$75k most similar roles pay here $822k

This role pays more than 99% of similar roles. Most pay $147,003–$232,750 — the shaded band above. At the midpoint, this role pays about $608k versus about $190k for comparable roles.

Based on 240 similar postings.

Employer

About Netflix

Netflix is the world''s leading streaming entertainment service, offering a vast library of TV series, films, documentaries, and original content to subscribers in over 190 countries. Industry: Streaming Entertainment & Media

Netflix currently has 117 open roles on FindRole.

Listed pay typically runs $388,000–$619,000 across 113 roles with salary data.

Most-posted roles

View all roles at Netflix

At a glance

TL;DR · Software Engineer 5 – Model Serving Systems, AI Platform

As part of Netflix’s Model Serving Systems team, you will join a dynamic group of engineers focused on building scalable AI infrastructure to support the company's growing machine learning needs. Your role involves developing and expanding compute infrastructure for large language models (LLMs) and other foundation models, ensuring high availability and performance in real-time model inference and serving platforms. You will work closely with cross-functional teams including product managers, ML engineers, and data scientists to drive AI/ML innovation across Netflix’s consumer and studio-facing applications. Proficiency in Java, experience with tools like Triton Inference Server, TensorRT, Docker, and familiarity with public cloud services such as AWS, Azure, or GCP are essential. This role requires a strong background in building high-traffic distributed systems for online ML model inference and an ability to streamline research-to-production workflows by reducing latency and costs.

What you'll do

  • Develop scalable model-serving infrastructure for large language models (LLMs) and other AI applications.
  • Enhance real-time model inference and serving platform to support high availability and performance.
  • Reduce latency and costs in deploying generative models and LLMs, optimizing research-to-production workflows.
  • Implement foundational abstractions ensuring consistency between online and offline systems for ML models.
  • Manage deployment, capacity planning, and performance tuning of AI/ML applications on public cloud platforms.

What we're looking for

  • Experience building high-traffic distributed services for online ML model inference.
  • Proficient in object-oriented programming (Java) with production hosting expertise.
  • Understanding of scalable model-serving solutions for generative models and LLMs.
  • Familiarity with deploying ML models using Triton Inference Server, TensorRT, Docker.
  • Experience working with public cloud platforms like AWS, Azure, or GCP.
  • Proactive in promoting observability and logging best practices.

More like this

Similar roles

Software Engineer 5 – Model Runtime, AI Platform

Netflix

Remote (Usa - Remote, US) 46 days ago $466,000$750,000
PyTorch DistributedTraining FSDP AWS GPU CUDA NCCL TensorRT Quantization KV-cache MultimodalModels DiffusionModels LLM SFT RLHF GRPO DPO CloudComputing CI/CD
Remote

Careers

Qualcomm

San Diego, CA 46 days ago
Python C++ C TensorFlow PyTorch ONNX GPU NPU CPU Computer_Vision Audio Generative_AI Linux Windows CI/CD

Software Development Engineer 5

Adobe

San Jose 28 days ago $208,300$301,600
Java Scala Python Docker Kubernetes AWS CI/CD SQL NoSQL Terraform Prometheus Grafana Adobe Experience Platform XDM NLU Machine Learning Knowledge Graphs