Site Reliability Engineer — Human Engineering
$181,100 - $318,400/year
Role Details
The Human Engineering Software team builds tools used across Apple for user studies, research participant management, health data collection, and privacy-preserving analytics. Our infrastructure spans Django backends, Kubernetes clusters (self-hosted and AWS), PostgreSQL, Redis, Kafka, Elasticsearch and a growing set of internal service integrations. This role is engineering-forward SRE. You'll spend as much time designing systems as operating them. You'll work closely with our full-stack engineers to improve how services communicate, how we observe production behavior, and how we ship changes safely. You'll have a seat at the architecture table — we want you proposing solutions, not just implementing them. Platform & Reliability Engineering - Own the reliability of our Kubernetes-hosted services across AWS and self-hosted clusters: deployments, scaling, capacity planning, certificate management, and secrets rotation. Design and implement SLO-driven observability: define meaningful SLIs, build dashboards that answer "is the system healthy?" not just "is the pod running?" Drive incident response and blameless postmortems Distributed Systems & Architecture - Partner with the architecture team on system design: service-to-service authentication (OIDC, gateway auth), event-driven messaging (Kafka), API gateway patterns. Design the infrastructure layer to make architecture proposals real in production. Evaluate and recommend new tools, patterns, and platforms and write code when it's the right tool, whether that's a deployment operator, a health check service, or a data pipeline component. This isn't a YAML-only role Engineering Enablement - Make the team efficient; own CI/CD pipelines and GitOps practices, owning tests to verify or production tools are functioning correctly, build self-service automation, evolve our observability and security posture, and communicate infrastructure decisions clearly across technical and non-technical stakeholders BS in Computer Science, Engineering, or equivalent practical experience, with 5+ years of experience in distributed systems Deep experience with Kubernetes in production — cluster operations, networking, storage, troubleshooting Strong proficiency designing and operating services in AWS (EC2, EKS, RDS, S3, IAM, VPC) Hands-on infrastructure-as-code experience (Terraform, Helm, or equivalent) Proficiency in at least one backend language (Python, Go, or similar) — you can write production services, not just scripts Experience with CI/CD pipeline design and GitOps workflows Strong understanding of networking fundamentals: DNS, load balancing, TLS, firewall rules, service discovery Excellent communication skills. You can explain a complex system to a room of engineers who didn't build it Experience building internal automation or self-service tooling (Slack bots, CLI tools, workflow orchestration) that reduced manual operational work BS in Computer Science, Engineering, or equivalent practical experience, with 7+ years of experience in distributed systems Experience with event-driven architectures (Kafka, RabbitMQ, or similar messaging systems) Experience with service mesh or API gateway patterns (Istio, Envoy, Kong, or similar) Familiarity with Django/Python web applications and their operational characteristics (Celery, Gunicorn, PostgreSQL) Experience with observability tooling beyond basic monitoring: distributed tracing, SLO frameworks, structured logging Background working with sensitive data (health data, PII) and associated compliance requirements Experience leading incident response and building on-call culture Contributions to internal or open-source infrastructure tooling
For more details click Job Post.
About Apple Inc
Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software