Machine Learning Engineer, ML/GenAI Evaluation

Apple Inc

Quick summary

Work type
On-site
Location
Austin, TX
Posted
3 days ago

Market check

Salary context

How this pay compares to similar roles

Similar $222k
$160k most similar roles pay here $277k

This listing doesn't post a salary. Most similar roles pay $195,000–$249,750.

Based on 240 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 1777 open roles on FindRole.

Listed pay typically runs $162,500–$272,100 across 1443 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Machine Learning Engineer, ML/GenAI Evaluation

As a Machine Learning Engineer specializing in Evaluation at Wallet, Payments, and Commerce, you will join a dedicated team to establish rigorous evaluation criteria and metrics frameworks for ML models that power Apple's financial features. Your daily responsibilities include designing adversarial test strategies, maintaining diverse test sets, and ensuring model robustness through distribution shift and fairness testing. You will develop user persona–stratified benchmarks and evaluate generative AI outputs, owning the final quality sign-off process before any feature launches. This role requires strong expertise in ML evaluation, Python programming, and fluency with tools like MLflow for experiment tracking. Ideal candidates have a background in machine learning or related fields, 5+ years of hands-on experience, and a track record of designing robust evaluation frameworks for production systems.

What you'll do

  • Define evaluation criteria and quality metrics for ML models powering Wallet features
  • Design adversarial test strategies to surface model failure modes before they reach users
  • Develop robustness testing methodologies including distribution shift and out-of-distribution generalization
  • Own end-to-end fairness evaluation, defining metrics and building bias test suites
  • Synthesize evaluation results into clear insights guiding model development priorities
  • Establish and maintain user persona–stratified benchmarks reflecting global user diversity
  • Own the final quality sign-off process for models before they ship to users

What we're looking for

  • M.S. in Machine Learning, Computer Science, Statistics, Applied Mathematics, or related field preferred; 7+ years hands-on ML experience required.
  • Deep expertise in model evaluation, offline metrics design, and behavioral testing for production systems.
  • Strong track record designing evaluation frameworks beyond standard accuracy/F1 metrics to include fairness, precision-recall tradeoffs, calibration.
  • Proven ability to construct adversarial test suites and edge-case corpora to identify failure modes before deployment.
  • Experience with Python, evaluation tooling, data pipelines, and experiment tracking (e.g., MLflow).
  • Excellent communication skills to translate metric results into product-quality narratives for diverse audiences.

More like this

Similar roles

Machine Learning Engineer, ML/GenAI Evaluation

Apple Inc

San Diego, CA 3 days ago $171,600$302,200
Python MLflow W&B Bayesian Causal graphs Confidence calibration Uncertainty quantification OCR pipelines Financial data extraction Fairness evaluation Distribution shift Temporal drift Adversarial testing Evaluation methodologies Structured document understanding Semi-structured document understanding Machine Learning Model evaluation

Machine Learning Engineer, ML/GenAI Evaluation

Apple Inc

New York City, NY 3 days ago $181,100$318,400
Python MLflow W&B Bayesian Causal graphs Confidence calibration Uncertainty quantification Fairness metrics Evaluation methodologies Adversarial testing Distribution shift Temporal drift OCR pipelines Financial data extraction Machine Learning Computer Science Statistics Applied Mathematics

Machine Learning Research Engineer

Anduril Industries

Washington, District of Columbia 12 days ago $220,000$292,000
Python PyTorch Transformer architectures Edge computing Deep learning CI/CD MLOps

Machine Learning Engineer

Adobe

San Jose 82 days ago $183,300$265,350
Python PyTorch LangChain LangGraph MCP ADK LLMs VLLMs CI/CD Docker AWS PostgreSQL Kubernetes

Machine Learning Engineer

Adobe

San Jose 92 days ago $161,700$234,150
Python TensorFlow PyTorch scikit-learn SparkML Kubernetes AWS CI/CD SQL Docker PostgreSQL MLOps