Pre-Training Text Data | Microsoft Careers

Microsoft

Hybrid

Quick summary

Work type
Hybrid
Location
Salary
$119,800–$234,700 / yr
Posted
127 days ago
Closes
Jul 28, 2026

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $195k
This role $177k
$105k most similar roles pay here $254k

This role pays less than 61% of similar roles. Most pay $169,137–$221,075 — the shaded band above. At the midpoint, this role pays about $177k versus about $195k for comparable roles.

Based on 240 similar postings.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 310 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 285 roles with salary data.

Most-posted roles

View all roles at Microsoft

At a glance

TL;DR · Pre-Training Text Data | Microsoft Careers

Join our Pretraining Text Data team as an engineer or researcher at the senior level, where you will contribute to developing next-generation large language models by curating high-quality datasets. Your day-to-day responsibilities include creating and evaluating text datasets, improving data quality through innovative collection strategies, and collaborating with cross-functional teams to ensure ethical standards are met. You will develop scalable pipelines for data ingestion and preprocessing, analyze real-world datasets, build tools for dataset auditing, and work closely with safety and ethics teams. Required skills include extensive experience in Python, Pandas, NumPy, and familiarity with frameworks like Spark or Apache Beam. This role is ideal for those passionate about pushing the boundaries of AI through rigorous data analysis and ethical considerations.

What you'll do

  • Develop novel data collection strategies for high-quality datasets.
  • Maintain scalable text data pipelines for ingestion, preprocessing, and annotation.
  • Analyze real-world text datasets to assess quality and identify improvements.
  • Build tools for dataset auditing, visualization, and versioning.
  • Ensure datasets meet ethical and responsible AI standards.

What we're looking for

  • Bachelor's degree in a relevant field and 4+ years of experience in coding with Python and common libraries.
  • Master's degree in a relevant field and 8+ years of experience in coding with Python and common libraries, or equivalent experience.
  • At least 2 years of experience in data analysis or engineering with large-scale unstructured datasets.
  • Proficiency in statistics and exploratory data analysis methods.
  • Familiarity with data processing frameworks like Spark, Ray, or Apache Beam.

More like this

Similar roles

| Microsoft Careers

Microsoft

US 53 days ago $142,800$274,800
Python Java Spark SQL Apache_Hadoop Kafka NoSQL Azure AWS GCP CI/CD Docker Kubernetes Terraform PostgreSQL MSSQL
Hybrid

Data Research Engineer | Microsoft Careers

Microsoft

US 179 days ago $119,800$234,700
Python Pandas NumPy Spark Ray Apache_Beam SQL CI/CD Git Jupyter_Notebook TensorFlow PyTorch PostgreSQL MongoDB Docker Kubernetes AWS Google_Cloud_Platform Azure GitHub
Hybrid

| Microsoft Careers

Microsoft

US 178 days ago $85,400$168,100
Python Pandas NumPy Spark Ray Apache_Beam Azure Docker Kubernetes CI/CD Git PostgreSQL TensorFlow PyTorch Hugging_Face GitHub_Pods Visual_Studio_Code LoRA DeBerTa Oscar Rho-1 Florence Phi_Models

| Microsoft Careers

Microsoft

US 92 days ago $119,800$234,700
CI/CD Python JavaScript Java C# Docker Kubernetes AWS Azure GitHub Jenkins PostgreSQL Git Linux
Hybrid

| Microsoft Careers

Microsoft

WA 30 days ago $100,600$199,000
Python SQL PostgreSQL AWS Azure GCP Visual_Studio_Code HTML CSS JavaScript ASP.NET REST jQuery Selenium Puppeteer Appium LLMs AI_ChatBots Prompt_Engineering Azure_DevOps GIT Azure_Open_AI Azure_Foundry
Hybrid