Staff Machine Learning Platform Engineer, AI Evaluation - Jobs - Careers at Apple

Apple Inc

Seattle, Washington, USA Posted today

$201,300 - $302,200/year

Role Details

Back to search results

Staff Machine Learning Platform Engineer, AI Evaluation

Seattle, Washington, United StatesMachine Learning and AI

Submit Resume

Summary

Posted: Apr 21, 2026

Role Number:200659247-3337

Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking a staff machine learning platform engineer to lead the architectural design and development of the high availability services and internal tools powering self-service evaluation at scale. You will partner with researchers to operationalize their innovations, transforming complex workflows into intuitive, developer-first platforms. We are looking for builders who thrive in the ambiguity of new initiatives and are passionate about creating scalable infrastructure.

Description

You will join the engineering team responsible for democratizing AI evaluation across the organization. Your focus will be on developing the developer experience—architecting and implementing the APIs, SDKs, and platform services that turn complex evaluation metrics into simple, self-service calls. You will work hand-in-hand with researchers to operationalize sophisticated measurement techniques, ensuring they scale reliably within our high-availability infrastructure. In this role, you will drive the engineering standards for a new organization, upholding the code quality, automation, and testing rigor required to support the rapid evolution of Generative AI and Agentic systems.

Responsibilities

  • System Design & Implementation: Lead the architecture for the core evaluation engine and distributed services, designing, coding, and shipping high-quality Python services. Own the technical direction for platform abstractions that will scale across Apple Services Engineering.
  • Technical Leadership & Collaboration: Mentor engineers across the team, conduct code reviews, and drive technical decision-making. Be the person others look to when facing the hardest engineering problems, and foster a culture of technical excellence and rapid delivery through example and collaboration.
  • Operationalizing Science: Lead the technical partnership with scientists to translate novel metrics, judge prompts, and scoring algorithms into scalable, production-grade services. Define the frameworks and architectural patterns to evaluate not just simple responses, but also multi-turn agent trajectories and tool usage.
  • System Integration: Serve as the primary technical bridge between the research organization and the broader engineering ecosystem, driving how our tools integrate seamlessly with existing ML infrastructure and developer workflows.
  • Engineering Rigor: Champion the software development lifecycle (SDLC) for the team, defining the standards for comprehensive automated testing (CI/CD), and instrumenting monitoring to ensure high availability and reliability.

Minimum Qualifications

  • 8+ years of hands-on software engineering experience, with a track record of owning the technical direction of a platform or infrastructure domain.
  • Strong proficiency in the Python ecosystem (e.g., FastAPI, Pydantic, Pandas). You write production-grade code and lead architectural discussions on day one.
  • Customer Obsession & Product Thinking: You have owned the technical roadmap for an internal platform, presented it to senior stakeholders, and shipped against it. You independently translate vague requirements from other teams into concrete engineering specifications and platform roadmaps.
  • Demonstrated experience leading technical partnerships with Data Scientists or Researchers: You have taken research code and shipped it as a production service and built the abstractions, testing frameworks, and deployment pipelines that made the next handoff faster than the last..
  • Strong expertise in API Design & Platform Infrastructure: You have designed and owned APIs and SDKs that other developers rely on, with a focus on versioning, backward compatibility, and developer experience at scale.
  • Operational excellence background: You have architected and owned CI/CD pipelines, containerization (Docker/Kubernetes), and monitoring (Datadog/Prometheus) for production services, and have been accountable for their reliability.
  • Bachelors in Computer Science or related field, Masters preferred.

Preferred Qualifications

  • Deep familiarity with AI Evaluation Frameworks: You have built, extended, or contributed to modern evaluation tools like DeepEval, Ragas, TruLens, or LangSmith. You understand how to implement and scale model-based evaluation workflows across a large organization.
  • Evaluation Service Deployment: Own the deployment, scaling, and operational health of evaluation services in production - including high-throughput evaluation job orchestration (queueing, prioritization, concurrency, auto-scaling), and defining SLAs for evaluation pipeline latency and availability.
  • Observability & Reliability: Experience instrumenting production ML evaluation pipelines including tracking evaluation job throughput, queue depth, judge model latency SLAs, scoring drift over time, and failure modes specific to non-deterministic LLM-based evaluation workflows.
  • Deep understanding of Generative AI & Agents: You understand the engineering challenges of relying on LLMs and Agents as software components—specifically managing token economics, handling rate limits, and evaluating non-deterministic, multi-step reasoning capabilities. You have built production systems that depend on these components and have solved these problems at scale.
  • Builder Experience: You have thrived in startup-like environments, navigating high ambiguity to deliver complex technical roadmaps from scratch.

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $201,300 and $302,200, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

Apple accepts applications to this posting on an ongoing basis.

Submit Resume

Back to search results

See all roles in Seattle

For more details click Job Post.

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software