Site Reliability Engineer (Edge Services), Infrastructure Services

Apple Inc

Quick summary

Work type: On-site
Location: Denver, CO
Salary: $132,100–$244,600 / yr
Posted: 18 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $182k

This role $188k

$119k most similar roles pay here $258k

This role pays more than 57% of similar roles. Most pay $142,400–$222,000 — the shaded band above. At the midpoint, this role pays about $188k versus about $182k for comparable roles.

Based on 238 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 638 open roles on FindRole.

Listed pay typically runs $171,600–$272,100 across 505 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Site Reliability Engineer (Edge Services), Infrastructure Services

Apply Now Log in to save

Join our Infrastructure Services team in Denver as a Site Reliability Engineer focusing on Edge Services, where you will drive the evolution of production ecosystems by designing and implementing advanced observability and alerting strategies. Your daily tasks include automating complex workflows, reducing toil through aggressive system automation, and collaborating with development teams to integrate reliability into CI/CD pipelines. You must be proficient in Python or Go for scripting and automation, have deep networking expertise, and experience with tools like Prometheus, Grafana, and Kubernetes. Ideal candidates also possess practical knowledge of SLIs, SLOs, and incident management, as well as hands-on experience managing cloud environments using Terraform or Ansible. Additionally, a proactive approach to system design and the ability to leverage generative AI for observability and debugging are highly valued in this role aimed at enhancing service resilience and scalability.

Skills

Python Go Prometheus Grafana Terraform Kubernetes AWS CI/CD SLIs SLOs Error_Budgets Release_Management Incident_Management Linux HTTP/2 HTTP/3_QUIC HTTPS_TLS Data_Structures_and_Algorithms_DSA

What you'll do

Design and implement advanced observability and alerting strategies using high-cardinality data.
Build self-healing systems to reduce operational toil through aggressive automation techniques.
Partner with development teams to integrate reliability practices into CI/CD pipelines.
Optimize traffic flow and debug protocol-level issues in complex distributed systems.
Manage modern monitoring suites like Prometheus, Grafana, and ClickHouse for actionable alerts.
Consult on service design to enhance long-term maintainability and system resilience.

What we're looking for

Deep understanding of Linux internals and networking protocols including HTTP/2, HTTP/3 (QUIC), and HTTPS/TLS.
Proven ability to automate tasks using Python or Go for complex workflows.
Experience configuring modern monitoring tools like Prometheus, Grafana, and ClickHouse with high-signal alerting.
Knowledge of SLIs, SLOs, error budgets, release management, and incident management to prioritize engineering efforts.
Strong grasp of data structures and algorithms to optimize code performance and troubleshoot system bottlenecks.
Practical application of generative AI tools in observability and debugging within production contexts.

Save