Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model

Apple Inc

Quick summary

Work type
On-site
Location
Seattle, WA
Salary
$171,600–$302,200 / yr
Posted
27 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $217k
This role $237k
$146k most similar roles pay here $319k

This role pays more than 66% of similar roles. Most pay $183,487–$249,750 — the shaded band above. At the midpoint, this role pays about $237k versus about $217k for comparable roles.

Based on 240 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 638 open roles on FindRole.

Listed pay typically runs $171,600–$272,100 across 505 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model

As a Senior/Staff ML Infrastructure Engineer on the Foundation Model Compute Infrastructure team, you will design and develop scheduling and orchestration systems for TPU-based workloads across multi-region clusters. Your day-to-day responsibilities include building topology-aware schedulers to improve utilization and reliability, developing orchestration systems for distributed ML workloads on Kubernetes, and automating provisioning and resource management workflows. You will collaborate with foundation model teams to support advanced frameworks like Pathways and JAX, and mentor engineers while influencing architectural direction across the AI compute platform. The role requires strong programming skills in Python, Go, or C++, experience with Kubernetes and large-scale cluster management systems, and expertise in distributed systems, scalability, reliability, and performance engineering. Familiarity with TPU infrastructure and frameworks such as JAX, TensorFlow, and PyTorch is preferred.

What you'll do

  • Design and evolve scheduling systems for TPU-based workloads across multi-region clusters.
  • Build topology-aware schedulers to enhance utilization and reliability of TPU infrastructure.
  • Develop orchestration systems for distributed ML workloads on Kubernetes and accelerator hardware.
  • Automate provisioning, resource management, and recovery handling to improve cluster efficiency.
  • Mentor engineers and influence architectural direction in Apple’s AI compute platform.

What we're looking for

  • 7+ years of experience building large-scale distributed systems or cloud infrastructure
  • Strong programming skills in Python, Go, C++, or similar languages
  • Extensive experience with compute infrastructure and workload scheduling
  • Expertise in distributed systems, scalability, reliability, and performance engineering
  • Experience with Kubernetes, container orchestration, or large-scale cluster management systems
  • Bachelor’s degree in Computer Science, Engineering, or related field

More like this

Similar roles

Staff ML Infrastructure Engineer (Compute)

General Motors (GM)

Remote (Gm Automation - Sunnyvale - Gm Automation - Sunnyvale, US) 9 days ago $197,000$326,000
Kubernetes Docker Go AWS GCP Azure CI/CD Prometheus Grafana Python PostgreSQL Terraform GitLab HPC GPU Telemetry
Remote