Site Reliability Engineer, Apple Data Platform

Apple Inc

Austin, Texas, USA Posted 13 days ago

Role Details

Apple Services infrastructure is planetary scale. Our Data Platform Site Reliability Engineering team manages the infrastructure and applications on bare-metal and cloud computing platforms to deliver data processing, governance, and storage for many of Apple’s global products and organizations. Our platform teams work with exabytes of data, terabytes of memory, and hundreds of thousands of jobs running millions of executors to support predicable and performant data analytics. Our platform enables key features in Apple Music, TV, Maps, News, and other world class products. Ensuring all of these technologies in geographically distributed data centers work together in harmony presents unique challenges. You’ll need to solve problems that arise using empirical data, teamwork, and your own unique expertise. Data Platform Services SREs work directly with our partner engineering teams, tightly collaborating with the software developers to deliver seamless experiences for our customers. We run a mix of open source, vendor licensed, and proprietary tools which you will use and have opportunities to improve upon. The cross functional team collaborates to ensure we apply a consistent incident management process across all data platform services and provide user journey based SLOs derived from exhaustive observability metrics, high availability architecture, and automation for deployments. We think critically and strive to balance long-term optimal solutions with the business priorities for each engineering challenge we face. Good ideas are heard and results are rewarded. BS/MS in Computer Science or Equivalent 5+ years of software development or production operations experience in a large-scale environment Proficiency in authoring and releasing code in Go, Python, or Java using common configuration management and software delivery platforms Experience operating production applications at scale, including well designed performance testing, HA and disaster recovery concepts, capacity planning, and managing distributed systems on internal and public cloud infrastructure, principally Kubernetes Understanding of the Linux Operating System, containers and virtualization, standard networking protocols, and components Strong sense of ownership and integrity demonstrated through clear communication and collaboration Demonstrates excellent troubleshooting and problem solving skills using the scientific method Proficiency with the architecture, deployment, performance tuning, and troubleshooting of open source data analytics or governance technologies such as Flink, Hive, Hadoop/HDFS, Trino, and/or Druid. Proficiency in managing applications and infra on AWS, GCP and Ali Cloud. The successful candidate is frustrated with toil and has an acute drive to both automate manual operations and evolve them into automatic processes.

For more details click Job Post.

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software