Governance & Operations Lead, Infrastructure & Planning
Cupertino, California, USA
Posted 7 days ago
$172,100 - $305,600/year
Role Details
- Own the daily operations of the systems you architect. You will design and oversee a scalable hub-and-spoke support model, spanning cross-functional tier-1 on-call teams, tier-2 team leads, and a dedicated tier-3 engineering escalation group that you will build and manage. - Own and evolve PACE's governance tooling and related systems, ensuring that compute resource requests, allocations, and utilization data are accurately captured to support rapid, at-scale analysis. - Bridge coverage gaps as Apple's ML ecosystem expands to new hardware (GPUs, TPUs, and custom silicon) and workloads (inference, on-device), balancing power, performance, cost, and compatibility. - Partner with the Data & Analytics Lead to maintain the analytical layer, building the dashboards, reports, and automated alerts that surface efficiency opportunities and track infrastructure savings. - Identify system anomalies and operational bottlenecks that degrade utilization and drive up costs, building financial impact models that translate technical metrics into actionable insights for leadership. - Partner with Apple's ML engineering teams, delivering data-driven analytics to optimize the foundation models, inference workloads, and platform tooling that rely on your data for success. - Design robust governance processes and automated operations engineered specifically to meet Apple-scale ML demands. - Partner to produce strategic analyses that inform executive decisions on ML compute investment, allocation, and strategy, directly influencing Apple's ML growth and feature development. BS in Computer Science, Data Science, Computer Engineering, or equivalent practical experience 5+ years in a governance/operations role, data engineering, analytics engineering, technical program management, or in a large-scale compute or cloud environment Organized, process-oriented, and comfortable owning operational systems other people depend on daily Strong cross-functional experience working with capable engineers, managers, EPMs, and leaders Proven experience designing and operating complex systems and processes from the ground up AI-fluent and capable of quickly adapting to AI workflows and empowerment Direct experience managing SRE and hierarchical technical support systems SQL and experience building analytical dashboards or data products (Tableau, Looker, Grafana, or similar) Experience designing data models or telemetry schemas for infrastructure, capacity, or utilization data Ability to translate raw technical metrics into clear business narratives for both engineers and executives Experience with Python for data analysis (pandas, notebooks) or lightweight pipeline development Familiarity with ML training infrastructure concepts: GPU utilization, training throughput, and scheduling efficiency mean, even if you have not optimized them directly Prior experience in FinOps, capacity planning, cloud cost management, or IT governance Experience building or operating data analytics systems Background in automated alerting or anomaly detection for infrastructure metrics
For more details click Job Post.
About Apple Inc
Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software