Machine Learning Safety: Evaluation Research Engineer

Apple Inc

Quick summary

Work type
On-site
Location
San Francisco, CA
Salary
$181,100–$318,400 / yr
Posted
56 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $224k
This role $250k
$154k most similar roles pay here $336k

This role pays more than 76% of similar roles. Most pay $197,925–$249,750 — the shaded band above. At the midpoint, this role pays about $250k versus about $224k for comparable roles.

Based on 240 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 1723 open roles on FindRole.

Listed pay typically runs $162,500–$272,100 across 1398 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Machine Learning Safety: Evaluation Research Engineer

As a Senior Evaluation Research Engineer on the Machine Learning Safety team, you will shape responsible AI policies by developing safety evaluation methodologies for generative and agentic AI features. Your daily tasks include creating risk assessments, taxonomies, and exemplar datasets that are culturally grounded, collaborating with experts to ensure comprehensive coverage across languages and cultures. You will also develop automated judge models to score AI system outputs for policy compliance, create scalable analysis pipelines, and maintain documentation for evaluation guidelines. The role requires expertise in taxonomy design, classification systems, annotation methodology, and experience working in multilingual contexts. Ideal candidates have a background in linguistics, information science, or computational social science, with a focus on responsible AI and content moderation policies.

What you'll do

  • Design, refine, and maintain safety-relevant taxonomies for risk categories and content types.
  • Develop exemplar datasets that illustrate taxonomy categories and edge cases across cultures.
  • Shape the development of automated judge models to score AI system outputs for safety compliance.
  • Create scalable analysis pipelines for cross-market safety assessments and reporting automation.
  • Author canonical evaluation guidelines adaptable across languages and markets, ensuring clarity and completeness.

What we're looking for

  • 4+ years of applied research experience in evaluation design, AI ethics, Responsible AI, AI safety, computational social science, or related field.
  • Strong understanding of taxonomy design, classification systems, and annotation methodology.
  • Experience developing evaluation guidelines and exemplar sets for human annotation tasks.
  • Ability to collaborate with subject matter experts across languages and cultural contexts.
  • Advanced degree (MS/PhD) in Linguistics, Information Science, Computational Social Science, or related socio-technical field.
  • Familiarity with responsible AI, AI safety, content moderation policy frameworks, and experimental design methodologies.

More like this

Similar roles

Machine Learning Safety: Evaluation Research Engineer

Apple Inc

Seattle, WA 56 days ago $171,600$302,200
Python SQL Terraform Git CI/CD Docker Kubernetes AWS Google Cloud Platform Azure PostgreSQL MLOps NLP TensorFlow PyTorch Scikit-learn Jupyter Notebook GitHub Confluence Tableau Prometheus Grafana

Machine Learning Research Engineer

Booz Allen Hamilton

Springfield, VA 57 days ago $99,000$225,000
PyTorch Transformer-based models Self-supervised learning Multi-task learning Docker CI/CD Python Git Jupyter Notebook TensorBoard Uncertainty estimation Conformal prediction OOD detection Hyperspectral data Masked autoencoders Contrastive learning Retrieval models Multimodal alignment

Machine Learning Research Engineer

Booz Allen Hamilton

Springfield, VA 10 days ago $99,000$225,000
PyTorch Transformer-based models Self-supervised learning Multi-task learning Docker CI/CD Python PostgreSQL Git GitHub Jupyter Notebook TensorFlow Kubernetes AWS Google Cloud Platform Azure Machine Learning Hyperspectral data Uncertainty estimation Conformal prediction OOD detection Masked autoencoders Contrastive learning Retrieval models Multimodal alignment

Machine Learning Research Engineer

Anduril Industries

Washington, District of Columbia 8 days ago $220,000$292,000
Python PyTorch Transformer architectures Edge computing Deep learning CI/CD MLOps

Machine Learning Engineer

Adobe

San Jose 78 days ago $183,300$265,350
Python PyTorch LangChain LangGraph MCP ADK LLMs VLLMs CI/CD Docker AWS PostgreSQL Kubernetes

Machine Learning Engineer

Adobe

San Jose 88 days ago $161,700$234,150
Python TensorFlow PyTorch scikit-learn SparkML Kubernetes AWS CI/CD SQL Docker PostgreSQL MLOps