Staff Research Engineer, Post-training & Evaluation

Reddit

Remote

Quick summary

Work type
Remote
Location
Remote
Salary
$230,000–$322,000 / yr
Posted
3 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $205k
This role $276k
$144k most similar roles pay here $341k

This role pays more than 94% of similar roles. Most pay $174,200–$235,625 — the shaded band above. At the midpoint, this role pays about $276k versus about $205k for comparable roles.

Based on 240 similar postings.

Employer

About Reddit

Reddit is a social news aggregation and discussion platform where users share content, vote on posts, and engage in community conversations across thousands of interest-based forums called subreddits.

Reddit currently has 94 open roles on FindRole.

Listed pay typically runs $217,000–$303,900 across 65 roles with salary data.

Most-posted roles

View all roles at Reddit

At a glance

TL;DR · Staff Research Engineer, Post-training & Evaluation

As a Staff Research Engineer for Post-Training & Evaluation Science at Reddit, you will join the AI Engineering team to develop and refine foundational Large Language Models (LLMs) that understand Reddit's unique culture. Your primary responsibilities include defining the "Reddit Benchmark" evaluation standard, ensuring reliability in model evaluations, designing post-training recipes, and partnering with Safety Engineering to translate policies into concrete metrics. You will work extensively with Python, Hugging Face Transformers, vLLM, and lm-eval-harness, while also contributing to synthetic data generation strategies and diagnosing post-training instability. This role requires deep expertise in evaluation reliability, custom domain-specific evaluation harnesses, and a comprehensive understanding of LLMs' post-training processes, making it ideal for those with extensive ML experience or a relevant PhD.

What you'll do

  • Define the "Reddit Benchmark" evaluation standard for model quality.
  • Ensure evaluation reliability and statistical rigor in benchmarking models.
  • Design methodologies for automated model-as-a-judge evaluations.
  • Set post-training recipes to convert base models into high-performing endpoints.
  • Evaluate base and CPT checkpoints to select optimal starting points.
  • Drive the strategy for generating synthetic data to improve model generalization.

What we're looking for

  • 6+ years of professional ML experience or PhD + 4 years in related field.
  • Deep expertise in evaluation reliability and statistical rigor for automated evaluations.
  • Strong experience building custom, domain-specific evaluation harnesses.
  • Experience evaluating both generation and representation/classification models.
  • Fluency in Python with strong data-pipeline and eval-harness engineering skills.

More like this

Similar roles

Applied Research Engineer

Salesforce

Remote (San Francisco, CA) +4 8 days ago $148,500$260,100
Python AWS Kubernetes Linux React GCP CI/CD Docker Prometheus PostgreSQL Git Jenkins Terraform GraphQL Redis MongoDB CICD Security_principles UI_design_sensibilities
Remote

Applied Research Engineer

Salesforce

Remote (San Francisco, CA) +4 7 days ago $197,300$313,700
Python AWS Kubernetes Linux React GCP CI/CD Docker Prometheus PostgreSQL Git Jenkins Terraform GraphQL Redis MongoDB CICD Security UI_design_sensibilities
Remote

System Performance Engineer, Staff

Qualcomm

San Diego, CA 47 days ago $148,300$222,500
Python C/C++ ARM_v8 ARM_v9 Vulkan OpenGL DX12 CUDA SMMU GIC Coresight-PMU Linux Windows Android Memory_hierarchy System_interconnects Power_management_stacks Scheduler_behavior Performance_analysis_tools CI/CD

Staff Applications Engineer

Broadcom

San Jose, CA 124 days ago $120,000$192,000
Python JavaScript C LLM APIs Vertex AI Dialogflow CX BigQuery CI/CD Docker Kubernetes Terraform PostgreSQL Networking Protocols ASIC Development SDK Development

Staff Implementation Engineer

Arm Holdings

Austin, TX 3 days ago $198,100$268,000
Python C Tcl git EDA tools RC analysis STA PDN analysis multi-die SoC design flows IR-PDN-Thermal bottlenecks large scale automation version control systems distributed processing disk management PVT analysis Vdrop analysis thermal aware methodology
Hybrid

Senior Staff Engineer

GEICO

Remote (Bethesda, MD) 34 days ago $115,000$260,000
Go Python .Net SQL NoSQL Kubernetes AWS GCP Azure Terraform Puppet Chef Ansible CI/CD DevOps Docker Prometheus Grafana Git Jenkins
Remote