Site Reliability Engineer (Edge Services), Infrastructure Services

Apple Inc

Quick summary

Work type: On-site
Location: Sunnyvale, CA
Salary: $147,400–$272,100 / yr
Posted: 18 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $169k

This role $210k

$115k most similar roles pay here $289k

This role pays more than 78% of similar roles. Most pay $137,459–$201,068 — the shaded band above. At the midpoint, this role pays about $210k versus about $169k for comparable roles.

Based on 239 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 969 open roles on FindRole.

Listed pay typically runs $163,300–$272,100 across 756 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Site Reliability Engineer (Edge Services), Infrastructure Services

Apply Now Log in to save

Join our Infrastructure Services team as a Site Reliability Engineer (SRE) focusing on Edge Services, where you will drive the evolution of our production ecosystems by designing and implementing advanced observability and alerting strategies. Your daily tasks include automating repetitive operations, optimizing traffic flow, and collaborating with development teams to integrate reliability into CI/CD pipelines. You will leverage Python or Go for automation, manage modern monitoring tools like Prometheus and Grafana, and apply your expertise in SLIs, SLOs, and error budgets to enhance system resilience. Ideal candidates possess deep Linux networking knowledge, experience with cloud environments using Terraform, and hands-on Kubernetes orchestration skills. Additionally, familiarity with Generative AI tools for observability and debugging is highly valued as you work towards a proactive stance on reliability and performance optimization in our large-scale distributed systems.

Skills

Python Go Prometheus Grafana Terraform Kubernetes AWS CI/CD SLIs SLOs Error_Budgets Release_Management Incident_Management Linux HTTP/2 HTTP/3_QUIC HTTPS_TLS Data_Structures_and_Algorithms_DSA Ansible Pulumi Generative_AI

What you'll do

Design and implement a next-generation observability and alerting strategy focusing on high-cardinality data.
Build self-healing systems and reduce operational toil through aggressive automation techniques.
Partner with development teams to integrate reliability practices into the CI/CD pipeline.
Optimize traffic flow and debug protocol-level issues using deep networking expertise.
Configure modern monitoring tools like Prometheus, Grafana, and ClickHouse for high-quality alerting.
Consult on service design to enhance long-term maintainability and resilience of systems.

What we're looking for

Deep understanding of Linux internals and expertise in HTTP/2, HTTP/3 (QUIC), and HTTPS/TLS.
Proven ability to automate tasks using Python or Go for complex workflows.
Experience configuring modern monitoring tools like Prometheus, Grafana, and ClickHouse.
Knowledge of SLIs, SLOs, error budgets, release management, and incident management.
Strong grasp of data structures and algorithms for efficient code writing and troubleshooting.
Practical fluency in applying generative AI within SRE workflows for debugging and triage.

Save