Software Engineer 5 – Model Serving Systems, AI Platform

Netflix

Remote

Quick summary

Work type: Remote
Location: Remote
Salary: $466,000–$750,000 / yr
Posted: 4 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $190k

This role $608k

$75k most similar roles pay here $822k

This role pays more than 99% of similar roles. Most pay $147,003–$232,750 — the shaded band above. At the midpoint, this role pays about $608k versus about $190k for comparable roles.

Based on 240 similar postings.

Employer

About Netflix

Netflix is the world''s leading streaming entertainment service, offering a vast library of TV series, films, documentaries, and original content to subscribers in over 190 countries. Industry: Streaming Entertainment & Media

Netflix currently has 117 open roles on FindRole.

Listed pay typically runs $388,000–$619,000 across 113 roles with salary data.

Most-posted roles

View all roles at Netflix

At a glance

TL;DR · Software Engineer 5 – Model Serving Systems, AI Platform

Apply Now Log in to save

As part of Netflix’s Model Serving Systems team, you will join a dynamic group of engineers focused on building scalable AI infrastructure to support the company's growing machine learning needs. Your role involves developing and expanding compute infrastructure for large language models (LLMs) and other foundation models, ensuring high availability and performance in real-time model inference and serving platforms. You will work closely with cross-functional teams including product managers, ML engineers, and data scientists to drive AI/ML innovation across Netflix’s consumer and studio-facing applications. Proficiency in Java, experience with tools like Triton Inference Server, TensorRT, Docker, and familiarity with public cloud services such as AWS, Azure, or GCP are essential. This role requires a strong background in building high-traffic distributed systems for online ML model inference and an ability to streamline research-to-production workflows by reducing latency and costs.

Skills

AWS Triton Inference Server TensorRT Docker Java Python Kubernetes CI/CD LLMs Model Serving Infrastructure High Availability Performance Tuning Deployment Management Capacity Planning Observability Logging

What you'll do

Develop scalable model-serving infrastructure for large language models (LLMs) and other AI applications.
Enhance real-time model inference and serving platform to support high availability and performance.
Reduce latency and costs in deploying generative models and LLMs, optimizing research-to-production workflows.
Implement foundational abstractions ensuring consistency between online and offline systems for ML models.
Manage deployment, capacity planning, and performance tuning of AI/ML applications on public cloud platforms.

What we're looking for

Experience building high-traffic distributed services for online ML model inference.
Proficient in object-oriented programming (Java) with production hosting expertise.
Understanding of scalable model-serving solutions for generative models and LLMs.
Familiarity with deploying ML models using Triton Inference Server, TensorRT, Docker.
Experience working with public cloud platforms like AWS, Azure, or GCP.
Proactive in promoting observability and logging best practices.

Software Development Engineer 5

Adobe

San Jose 28 days ago $208,300–$301,600

Java Scala Python Docker Kubernetes AWS CI/CD SQL NoSQL Terraform Prometheus Grafana Adobe Experience Platform XDM NLU Machine Learning Knowledge Graphs

Save