Machine Learning Engineer, ML/GenAI Evaluation

Apple Inc

Quick summary

Work type
On-site
Location
New York City, NY
Salary
$181,100–$318,400 / yr
Posted
3 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $222k
This role $250k
$154k most similar roles pay here $336k

This role pays more than 79% of similar roles. Most pay $195,000–$249,750 — the shaded band above. At the midpoint, this role pays about $250k versus about $222k for comparable roles.

Based on 240 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 1777 open roles on FindRole.

Listed pay typically runs $162,500–$272,100 across 1443 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Machine Learning Engineer, ML/GenAI Evaluation

As a Machine Learning Engineer specializing in Evaluation at Wallet, Payments, and Commerce, you will join a team dedicated to defining exceptional standards for model quality. Your role involves establishing evaluation criteria and metrics frameworks that ensure ML models meet the highest accuracy, robustness, and fairness before reaching hundreds of millions of users. Day-to-day responsibilities include designing adversarial test strategies, maintaining diverse test sets, and conducting rigorous evaluations to prevent failure modes in real-world scenarios. You will work with Python, evaluation tooling like MLflow or W&B, and have a strong background in model evaluation, offline metrics design, and behavioral testing. This role requires expertise in fairness evaluation, distribution shift testing, and experience with structured document understanding. Your findings will directly influence product decisions at scale, ensuring that models are not only technically sound but also ethically responsible.

What you'll do

  • Define evaluation criteria and quality metrics for ML models powering Wallet features
  • Design adversarial test strategies to identify model failure modes before they reach users
  • Develop robustness testing methodologies including distribution shift, out-of-distribution generalization, and temporal drift
  • Own end-to-end fairness evaluation by defining metrics and building bias test suites across protected attributes
  • Synthesize evaluation results into clear insights guiding model development priorities and product decisions
  • Establish and maintain user persona–stratified benchmarks reflecting the diversity of Wallet's global user base

What we're looking for

  • M.S. in Machine Learning, Computer Science, Statistics, Applied Mathematics, or related field preferred; 7+ years hands-on ML experience required.
  • Deep expertise in model evaluation, offline metrics design, and behavioral testing for production systems.
  • Proven ability to construct adversarial test suites and edge-case corpora that surface model failure modes.
  • Strong programming skills in Python with fluency in evaluation tooling and data pipelines.
  • Experience designing evaluation frameworks beyond accuracy/F1, including precision-recall tradeoffs, calibration, fairness.
  • Excellent communication skills for translating metric results into product-quality narratives.

More like this

Similar roles

Machine Learning Engineer, ML/GenAI Evaluation

Apple Inc

San Diego, CA 3 days ago $171,600$302,200
Python MLflow W&B Bayesian Causal graphs Confidence calibration Uncertainty quantification OCR pipelines Financial data extraction Fairness evaluation Distribution shift Temporal drift Adversarial testing Evaluation methodologies Structured document understanding Semi-structured document understanding Machine Learning Model evaluation

Machine Learning Engineer, ML/GenAI Evaluation

Apple Inc

Austin, TX 3 days ago
Python MLflow W&B Bayesian Causal Graphs Counterfactual Fairness Structural Causal Models Confidence Calibration Uncertainty Quantification AWS Kubernetes PostgreSQL CI/CD

Machine Learning Research Engineer

Anduril Industries

Washington, District of Columbia 12 days ago $220,000$292,000
Python PyTorch Transformer architectures Edge computing Deep learning CI/CD MLOps

Machine Learning Engineer

Adobe

San Jose 82 days ago $183,300$265,350
Python PyTorch LangChain LangGraph MCP ADK LLMs VLLMs CI/CD Docker AWS PostgreSQL Kubernetes

Machine Learning Engineer

Adobe

San Jose 92 days ago $161,700$234,150
Python TensorFlow PyTorch scikit-learn SparkML Kubernetes AWS CI/CD SQL Docker PostgreSQL MLOps