Senior Site Reliability Engineer, CORE (Member Experience / Resilience Operations)

Netflix

Remote

Quick summary

Work type
Remote
Location
Remote
Salary
$388,000–$558,000 / yr
Posted
63 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $173k
This role $473k
$87k most similar roles pay here $608k

This role pays more than 99% of similar roles. Most pay $142,400–$203,200 — the shaded band above. At the midpoint, this role pays about $473k versus about $173k for comparable roles.

Based on 240 similar postings.

Employer

About Netflix

Netflix is the world''s leading streaming entertainment service, offering a vast library of TV series, films, documentaries, and original content to subscribers in over 190 countries. Industry: Streaming Entertainment & Media

Netflix currently has 117 open roles on FindRole.

Listed pay typically runs $388,000–$619,000 across 113 roles with salary data.

Most-posted roles

View all roles at Netflix

At a glance

TL;DR · Senior Site Reliability Engineer, CORE (Member Experience / Resilience Operations)

The Critical Operations and Reliability Engineering (CORE) team at Netflix seeks a Senior Site Reliability Engineer to enhance system reliability and observability for its global streaming service. This role involves designing fault-tolerant infrastructure, embedding reliability practices in the software development lifecycle, and defining key performance metrics like Service Level Objectives. The ideal candidate will automate deployment processes, manage on-call responsibilities, and lead incident response efforts while fostering a culture of continuous improvement across teams. Strong coding skills in Python, Go, or Java are essential, along with hands-on experience with cloud infrastructure such as AWS, Azure, or GCP. Candidates should have a deep understanding of distributed systems and the ability to balance reliability, velocity, and cost through data-driven decision-making.

What you'll do

  • Design and evolve resilient infrastructure for member-facing services, ensuring scalability and fault tolerance.
  • Embed reliability and observability into software development lifecycle across multiple teams.
  • Define and measure Service Level Objectives (SLOs) to guide capacity planning and operational priorities.
  • Build automated processes for deployment, monitoring, and incident response to ensure reliable operations.
  • Lead incident response efforts, focusing on learning and systemic fixes to avoid repeat issues.
  • Identify and reduce sources of instability in distributed systems through production analysis.

What we're looking for

  • 5+ years of experience in SRE or Production Engineering roles for business-critical services.
  • Proficient in Python, Go, Java, or similar languages for automation and solution development.
  • Expertise in large-scale cloud environments on AWS, Azure, GCP with abstracted compute systems.
  • Deep understanding of distributed system failures, performance bottlenecks, and resilience design.
  • Proven ability to identify and mitigate reliability risks through metrics and architecture reviews.
  • Strong observability skills using metrics, logs, traces for debugging complex systems.
  • Experience in incident management, response coordination, and durable improvement follow-through.

More like this

Similar roles

Senior Site Reliability Engineer

Adobe

San Jose 65 days ago $208,300$301,600
AWS Kubernetes Terraform Python Go CI/CD Infrastructure as Code Docker PostgreSQL Security hardening AI-enabled platforms Cross-team leadership Developer experience optimization

Senior Site Reliability Engineer

Carta

San Francisco, California +2 69 days ago $181,688$213,750
AWS Terraform Python Kubernetes Docker Postgres Prometheus Grafana CI/CD gRPC Ansible ELK Stack Datadog GraphQL
Hybrid