Evaluation Reliability SRE
At a glance
AI generatedTL;DR
As a senior Site Reliability Engineer (SRE) on the Evaluation Reliability Engineering (ERE) team at Siri, you will play a critical role in ensuring the reliability of the evaluation infrastructure stack, including orchestration, capacity management, and service health. Your day-to-day responsibilities include leading incident investigations, authoring high-quality runbooks for complex failure scenarios, and building deep expertise in device orchestration and provisioning layers to diagnose upstream issues independently. You will also instrument infrastructure components lacking observability, balance proactive reliability work with incident response, and partner on defining SLOs and burn-rate alerting. Fluency with agentic coding tools like Claude Code or Copilot is essential for automating runbooks and log analysis. Ideal candidates have extensive experience in site reliability engineering, hands-on orchestration skills, and a track record of improving system reliability through measurable outcomes.
Skills
What you'll do
- Own reliability outcomes across evaluation infrastructure: orchestration, capacity, and service health.
- Lead incident investigations end-to-end and set operational standards for the team.
- Build expertise in device orchestration and provisioning layers to diagnose issues independently.
- Instrument infrastructure components lacking observability to detect failures proactively.
- Balance incident response with proactive reliability work, focusing on automation and eliminating recurring failures.
What we're looking for
- 5+ years of site reliability or infrastructure engineering experience with direct production system ownership
- Hands-on experience with Kubernetes or equivalent orchestration tools for cluster health and resource management
- Expertise in device or VM provisioning pipelines and virtualization-layer failure modes
- Proven track record of improving system reliability through measurable outcomes like uptime, MTTR, and incident frequency
- Incident command discipline to lead multi-team incidents from declaration to resolution
- Depth in distributed systems reliability, device management infrastructure, evaluation, or ML platform operations
Employer
About Apple Inc
Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software
Apple Inc currently has 255 open roles on FindRole.
Listed pay typically runs $171,600–$272,100 across 182 roles with salary data.
Most-posted roles
- Software Development Engineer 10
- Apple Business Systems Engineer Manager 8
- iPad Touch Electrical Engineer 3
- Machine Learning Engineer, Apple Store Online 3
- Manager, Machine Learning, Apple Store Online 3