Staff Applied Scientist, AI Quality & Meta Evaluation

Apple Inc

Quick summary

Work type: On-site
Location: Seattle, WA
Salary: $201,300–$302,200 / yr
Posted: 31 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $221k

This role $252k

$167k most similar roles pay here $317k

This role pays more than 76% of similar roles. Most pay $191,537–$249,753 — the shaded band above. At the midpoint, this role pays about $252k versus about $221k for comparable roles.

Based on 240 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 638 open roles on FindRole.

Listed pay typically runs $171,600–$272,100 across 505 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Staff Applied Scientist, AI Quality & Meta Evaluation

Apply Now Log in to save

As a Principal Applied Scientist on the Human Centered AI team at Apple Services Engineering, you will lead the technical development of the Data Quality Validation framework, ensuring the reliability and trustworthiness of large language model evaluations. You will design statistical frameworks to detect discrepancies between automated evaluators and human ground truth, develop algorithms for risk-stratified sampling, and establish rigorous standards for immutable ground truth sets. Your role involves hands-on work with Python to build models that audit Production LLM Judge outputs for bias and drift, ensuring new judges are rigorously validated before deployment. This high-impact position requires expertise in uncertainty quantification, model calibration, and anomaly detection, as well as experience with Human-in-the-Loop pipelines and large-scale reasoning models.

Skills

Python Bayesian Uncertainty Quantification Entropy Modeling Statistical Process Control Inter-rater Reliability HITL Active Learning CI/CD Out-of-distribution Detection Large-scale Reasoning Models LLM Evaluation Science

What you'll do

Design and develop the reasoning agent that audits Production LLM Judge outputs.
Develop statistical and ML approaches to detect divergence from ground truth.
Define algorithms for risk-stratified smart sampling in deeper review processes.
Establish hierarchical weighting models and confidence interval frameworks.
Set standards for building, versioning, and validating immutable ground truth sets.
Validate new LLM Judges through standard validation processes before production.
Serve as the scientific authority on data quality evaluation methodology for partners.

What we're looking for

Master's degree in Statistics, Data Science, Machine Learning, Computer Science, or related quantitative field
8+ years experience in applied data science, ML research, or evaluation science
Expertise in uncertainty quantification and model calibration techniques
Experience building disagreement detection models in production systems
Strong command of statistical measurement frameworks including inter-rater reliability
Proficiency in Python for statistical modeling and ML experimentation
Ability to translate complex statistical findings into actionable guidance

Similar roles

Staff Machine Learning Platform Engineer, AI Evaluation

Apple Inc

Seattle, WA 44 days ago $201,300–$302,200

Python FastAPI Pydantic Temporal.io Go Rust Ray Dask CI/CD Docker Kubernetes Prometheus Grafana

Save

Senior Staff AI Scientist

Intuit

Atlanta, GA 50 days ago

Python TensorFlow PyTorch Kubernetes AWS Docker CI/CD Git PostgreSQL MongoDB Scikit-learn Pandas NumPy Jupyter Notebook RESTful APIs Swagger GraphQL

Save

Senior Staff AI Research Scientist

Intuit

Mountain View, CA 51 days ago $226,000–$306,000

Python PyTorch TensorFlow NeurIPS ICML ICLR AAAI KDD ACL Decision-focused AI Probabilistic modeling Causal inference Simulation-based planning Agentic and multi-agent systems Neuro-symbolic AI LLM-based reasoning Deep learning Optimization Statistical machine learning

Save

AI/ML Staff Researcher

General Motors (GM)

Mountain View, CA 10 days ago

Python TensorFlow PyTorch Keras Scikit-learn AWS Azure Google Cloud Platform CI/CD Docker Kubernetes Git Jupyter Notebook PostgreSQL MongoDB Responsible AI Large Language Models Generative AI Physics-based AI Scientific Machine Learning

Hybrid

Save

Applied AI Scientist

Apple Inc

Cupertino, CA 44 days ago $181,100–$318,400

TensorFlow PyTorch AWS GCP Azure SageMaker Vertex_AI MLflow SQL Diffusion_models Computer_vision Multimodal_models Video_generation User_behavior_analysis Feature_analytics MLOps Data_drift_tracking Version_control Testing Code_review

Save

Applied AI Scientist

Apple Inc

Culver City, CA 44 days ago $171,600–$302,200

TensorFlow PyTorch AWS GCP Azure SageMaker Vertex_AI MLflow SQL CNN RNN Transformers Diffusion_models Computer_vision Multimodal_models Video_generation Data_drift Model_monitoring Version_control Testing Code_review CI/CD

Save