Annotation Data Scientist, Evaluation Integrity (Siri)

Apple Inc

Quick summary

Work type: On-site
Location: Cambridge, MA
Salary: $154,600–$274,900 / yr
Posted: 17 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $181k

This role $215k

$118k most similar roles pay here $292k

This role pays more than 69% of similar roles. Most pay $135,000–$227,262 — the shaded band above. At the midpoint, this role pays about $215k versus about $181k for comparable roles.

Based on 240 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 638 open roles on FindRole.

Listed pay typically runs $171,600–$272,100 across 505 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · Annotation Data Scientist, Evaluation Integrity (Siri)

Apply Now Log in to save

Join the Evaluation Integrity team as an Annotation Data Scientist to revolutionize how Siri is evaluated by designing human-in-the-loop (HITL) annotation tasks that scrutinize user agent personae, conversations, and automated evaluators against product specifications. You will own end-to-end annotation initiatives, from rubric design and tooling through data analysis, ensuring human judgments inform pre-ship decisions rigorously. Utilizing Python for data processing and analysis, you will manage multiple concurrent projects, collaborate with software engineers on custom tooling, and refine guidelines based on inter-annotator agreement. Ideal candidates have a quantitative background and 5+ years of experience in machine learning evaluation methodologies, along with expertise in statistical methods and large-scale dataset management. This role requires strong communication skills to work effectively across functions and deliver actionable insights for product teams.

Skills

Python pandas Jupyter SQL Spark CI/CD LLM-as-judge Cohen's kappa Fleiss' kappa Krippendorff's alpha bootstrapping HITL NLP ML UI design data visualization inter-annotator agreement

What you'll do

Design HITL annotation tasks to assess the quality of user agent personae and validity of conversations.
Author and maintain rubrics for human grading aligned with agentic evaluators and product guidelines.
Manage multiple annotation programs end-to-end from requirements gathering to stakeholder delivery.
Develop custom annotation tooling in partnership with software engineers to support evaluation tasks.
Apply data science techniques to analyze human-labeled data, measuring evaluator accuracy and reliability.
Translate annotator feedback into improvements for user agents and automated evaluators.

What we're looking for

Bachelor's or Master's degree in a quantitative field or equivalent experience.
5+ years of hands-on experience with human-in-the-loop evaluation methodologies for ML systems.
Expertise in Python for data processing, analysis, and prototyping using libraries like pandas and Jupyter.
Experience designing and implementing annotation schemas and rubrics for machine learning training or evaluation.
Ability to manage multiple concurrent dataset curation efforts, including coordinating with annotators and monitoring performance metrics.

Similar roles

Sr. Machine Learning Research Engineer, Siri Speech

Apple Inc

Cupertino, CA 20 days ago $181,100–$318,400

Python TensorFlow PyTorch Keras Scikit-learn CUDA C++ Java Swift Docker Kubernetes CI/CD AWS Azure Google Cloud Platform PostgreSQL MongoDB Redis Git Jupyter Notebook Prometheus

Save

Sr. Machine Learning Research Engineer, Siri Speech

Apple Inc

Cupertino, CA 41 days ago $181,100–$318,400

Swift C++ Objective-C PyTorch JAX Machine Learning CI/CD

Save

Sr. Machine Learning Engineer, Siri Speech

Apple Inc

Cupertino, CA 28 days ago $181,100–$318,400

Python PyTorch TensorFlow JAX AWS GCP Azure Docker Kubernetes MLflow Weights & Biases Kubeflow MLOps CI/CD

Save

Sr. Software Engineer - Data, Siri Speech

Apple Inc

Cambridge, MA 50 days ago $132,100–$244,600

Python CI/CD Apache Beam Apache Spark Dask Ray Kubernetes AWS PostgreSQL MongoDB Git Jenkins Prometheus Grafana Docker Terraform GitHub Swagger/OpenAPI

Save

Sr. Machine Learning Engineer, ASR Infrastructure and Tools, Siri Speech

Apple Inc

Cupertino, CA 24 days ago $181,100–$318,400

PySpark Jax Ray Beam Spark Dask Python HPC GPUs TPUs Speech recognition Natural language processing Dialogue management

Save

Sr. Software Engineer - Data, Siri Speech

Apple Inc

Cupertino, CA 24 days ago $147,400–$272,100

Python CI/CD Apache Beam Apache Spark Dask Ray PostgreSQL Kubernetes AWS Google Cloud Platform Azure Terraform Git Jenkins Docker Prometheus Grafana

Save