Site Reliability Engineer (Edge Services), Infrastructure Services

Apple Inc

Quick summary

Work type: On-site
Location: Elk Grove, CA
Salary: $132,100–$244,600 / yr
Posted: 18 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $182k

This role $188k

$119k most similar roles pay here $258k

This role pays more than 57% of similar roles. Most pay $142,400–$222,000 — the shaded band above. At the midpoint, this role pays about $188k versus about $182k for comparable roles.

Based on 238 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 638 open roles on FindRole.

Listed pay typically runs $171,600–$272,100 across 505 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Site Reliability Engineer (Edge Services), Infrastructure Services

Apply Now Log in to save

Join our Infrastructure Services team as a Site Reliability Engineer (SRE) focusing on Edge Services, where you will drive the evolution of production ecosystems by designing and implementing advanced observability and alerting strategies. Your daily tasks include automating repetitive operations, reducing toil through aggressive automation, and collaborating with development teams to integrate reliability into CI/CD pipelines. You will leverage Python or Go for scripting and automation, work with modern monitoring tools like Prometheus and Grafana, and apply your knowledge of SLIs, SLOs, and error budgets to optimize system performance. Ideal candidates have experience in cloud environments (AWS, GCP, Azure) using Terraform, Ansible, or Pulumi, hands-on Kubernetes orchestration skills, and a proactive approach to service design that prioritizes long-term maintainability and graceful failure handling.

Skills

Python Go Prometheus Grafana Terraform Kubernetes AWS CI/CD SLIs SLOs Error Budgets Release Management Incident Management Linux HTTP/2 HTTP/3 QUIC HTTPS TLS Data Structures and Algorithms Generative AI

What you'll do

Design and implement advanced observability and alerting strategies focusing on high-cardinality data.
Build self-healing systems to reduce operational toil through automation of complex workflows.
Partner with development teams to integrate reliability practices into CI/CD pipelines.
Optimize traffic flow and debug protocol-level issues using deep networking expertise.
Manage modern monitoring suites like Prometheus, Grafana, and ClickHouse for actionable alerts.
Consult on service design to enhance long-term maintainability and resilience of systems.

What we're looking for

Deep understanding of Linux internals and expertise in HTTP/2, HTTP/3 (QUIC), and HTTPS/TLS protocols.
Proven ability to automate tasks using Python or Go for complex workflows.
Experience configuring modern monitoring tools like Prometheus, Grafana, and ClickHouse.
Knowledge of SLIs, SLOs, error budgets, release management, and incident management.
Practical application of Data Structures and Algorithms (DSA) in troubleshooting system bottlenecks.
Hands-on experience with Kubernetes for scaling and securing containerized workloads.

Save