Site Reliability Engineer (Edge Services), Infrastructure Services

Apple Inc

Quick summary

Work type
On-site
Location
Sunnyvale, CA
Salary
$147,400–$272,100 / yr
Posted
18 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $169k
This role $210k
$115k most similar roles pay here $289k

This role pays more than 78% of similar roles. Most pay $137,459–$201,068 — the shaded band above. At the midpoint, this role pays about $210k versus about $169k for comparable roles.

Based on 239 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 969 open roles on FindRole.

Listed pay typically runs $163,300–$272,100 across 756 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Site Reliability Engineer (Edge Services), Infrastructure Services

Join our Infrastructure Services team as a Site Reliability Engineer (SRE) focusing on Edge Services, where you will drive the evolution of our production ecosystems by designing and implementing advanced observability and alerting strategies. Your daily tasks include automating repetitive operations, optimizing traffic flow, and collaborating with development teams to integrate reliability into CI/CD pipelines. You will leverage Python or Go for automation, manage modern monitoring tools like Prometheus and Grafana, and apply your expertise in SLIs, SLOs, and error budgets to enhance system resilience. Ideal candidates possess deep Linux networking knowledge, experience with cloud environments using Terraform, and hands-on Kubernetes orchestration skills. Additionally, familiarity with Generative AI tools for observability and debugging is highly valued as you work towards a proactive stance on reliability and performance optimization in our large-scale distributed systems.

What you'll do

  • Design and implement a next-generation observability and alerting strategy focusing on high-cardinality data.
  • Build self-healing systems and reduce operational toil through aggressive automation techniques.
  • Partner with development teams to integrate reliability practices into the CI/CD pipeline.
  • Optimize traffic flow and debug protocol-level issues using deep networking expertise.
  • Configure modern monitoring tools like Prometheus, Grafana, and ClickHouse for high-quality alerting.
  • Consult on service design to enhance long-term maintainability and resilience of systems.

What we're looking for

  • Deep understanding of Linux internals and expertise in HTTP/2, HTTP/3 (QUIC), and HTTPS/TLS.
  • Proven ability to automate tasks using Python or Go for complex workflows.
  • Experience configuring modern monitoring tools like Prometheus, Grafana, and ClickHouse.
  • Knowledge of SLIs, SLOs, error budgets, release management, and incident management.
  • Strong grasp of data structures and algorithms for efficient code writing and troubleshooting.
  • Practical fluency in applying generative AI within SRE workflows for debugging and triage.

More like this

Similar roles

Sr Site Reliability Engineer, Customer Systems

Apple Inc

Austin, TX 17 days ago
Kubernetes Helm Python Shell Scripting Ansible Splunk Grafana Prometheus Alertmanager CI/CD DNS TCP HTTP AWS S3 Cassandra MongoDB Couchbase Java ArgoCD GitOps MTTR SLO GenAI