Machine Learning Safety: Evaluation Research Engineer

Apple Inc

Quick summary

Work type: On-site
Location: San Francisco, CA
Salary: $181,100–$318,400 / yr
Posted: 56 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $224k

This role $250k

$154k most similar roles pay here $336k

This role pays more than 76% of similar roles. Most pay $197,925–$249,750 — the shaded band above. At the midpoint, this role pays about $250k versus about $224k for comparable roles.

Based on 240 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 1723 open roles on FindRole.

Listed pay typically runs $162,500–$272,100 across 1398 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Machine Learning Safety: Evaluation Research Engineer

Apply Now Log in to save

As a Senior Evaluation Research Engineer on the Machine Learning Safety team, you will shape responsible AI policies by developing safety evaluation methodologies for generative and agentic AI features. Your daily tasks include creating risk assessments, taxonomies, and exemplar datasets that are culturally grounded, collaborating with experts to ensure comprehensive coverage across languages and cultures. You will also develop automated judge models to score AI system outputs for policy compliance, create scalable analysis pipelines, and maintain documentation for evaluation guidelines. The role requires expertise in taxonomy design, classification systems, annotation methodology, and experience working in multilingual contexts. Ideal candidates have a background in linguistics, information science, or computational social science, with a focus on responsible AI and content moderation policies.

Skills

Python SQL Terraform Git Jupyter CI/CD Docker Kubernetes Prometheus Grafana AWS Google Cloud Platform Azure PostgreSQL MongoDB GitHub Confluence Jira Scrum Agile TensorFlow PyTorch

What you'll do

Design, refine, and maintain safety-relevant taxonomies for risk categories and content types.
Develop exemplar datasets that illustrate taxonomy categories and edge cases across cultures.
Shape the development of automated judge models to score AI system outputs for safety compliance.
Create scalable analysis pipelines for cross-market safety assessments and reporting automation.
Author canonical evaluation guidelines adaptable across languages and markets, ensuring clarity and completeness.

What we're looking for

4+ years of applied research experience in evaluation design, AI ethics, Responsible AI, AI safety, computational social science, or related field.
Strong understanding of taxonomy design, classification systems, and annotation methodology.
Experience developing evaluation guidelines and exemplar sets for human annotation tasks.
Ability to collaborate with subject matter experts across languages and cultural contexts.
Advanced degree (MS/PhD) in Linguistics, Information Science, Computational Social Science, or related socio-technical field.
Familiarity with responsible AI, AI safety, content moderation policy frameworks, and experimental design methodologies.

Similar roles

Machine Learning Safety: Evaluation Research Engineer

Apple Inc

Seattle, WA 56 days ago $171,600–$302,200

Python SQL Terraform Git CI/CD Docker Kubernetes AWS Google Cloud Platform Azure PostgreSQL MLOps NLP TensorFlow PyTorch Scikit-learn Jupyter Notebook GitHub Confluence Tableau Prometheus Grafana

Save

Machine Learning Research Engineer

Booz Allen Hamilton

Springfield, VA 57 days ago $99,000–$225,000

PyTorch Transformer-based models Self-supervised learning Multi-task learning Docker CI/CD Python Git Jupyter Notebook TensorBoard Uncertainty estimation Conformal prediction OOD detection Hyperspectral data Masked autoencoders Contrastive learning Retrieval models Multimodal alignment

Save

Machine Learning Research Engineer

Booz Allen Hamilton

Springfield, VA 10 days ago $99,000–$225,000

PyTorch Transformer-based models Self-supervised learning Multi-task learning Docker CI/CD Python PostgreSQL Git GitHub Jupyter Notebook TensorFlow Kubernetes AWS Google Cloud Platform Azure Machine Learning Hyperspectral data Uncertainty estimation Conformal prediction OOD detection Masked autoencoders Contrastive learning Retrieval models Multimodal alignment

Save