Member of Technical Staff - Pretraining Text Data | Microsoft Careers

Microsoft

Hybrid Actively hiring
US Posted 122 days ago $119,800$234,700 / year

At a glance

AI generated

TL;DR

Join our Pretraining Text Data team as an engineer or researcher at the senior level, where you will contribute to developing next-generation large language models by curating high-quality datasets. Your day-to-day responsibilities include creating and evaluating text datasets, improving data quality through innovative collection strategies, and collaborating with cross-functional teams to ensure ethical standards are met. You will develop scalable pipelines for data ingestion and preprocessing, analyze real-world datasets, build tools for dataset auditing, and work closely with safety and ethics teams. Required skills include extensive experience in Python, Pandas, NumPy, and familiarity with frameworks like Spark or Apache Beam. This role is ideal for those passionate about pushing the boundaries of AI through rigorous data analysis and ethical considerations.

Skills

Python Pandas NumPy Spark Ray Apache_Beam CI/CD Git Jupyter_Notebook PostgreSQL MongoDB AWS_S3 Docker Kubernetes Terraform Prometheus Grafana

What you'll do

  • Develop novel data collection strategies for high-quality datasets.
  • Maintain scalable text data pipelines for ingestion, preprocessing, and annotation.
  • Analyze real-world text datasets to assess quality and identify improvements.
  • Build tools for dataset auditing, visualization, and versioning.
  • Ensure datasets meet ethical and responsible AI standards.

What we're looking for

  • Bachelor's degree in a relevant field and 4+ years of experience in coding with Python and common libraries.
  • Master's degree in a relevant field and 8+ years of experience in coding with Python and common libraries, or equivalent experience.
  • At least 2 years of experience in data analysis or engineering with large-scale unstructured datasets.
  • Proficiency in statistics and exploratory data analysis methods.
  • Familiarity with data processing frameworks like Spark, Ray, or Apache Beam.

Market check

Salary context

This $119,800–$234,700 range sits above 44% of similar postings on FindRole.

Peer median band

$139,900$237,600

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$170,375$214,500

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 534 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 488 roles with salary data.

Most-posted roles

View all roles at Microsoft

More like this

Similar roles

| Microsoft Careers

Microsoft

Mountain View, CA 116 days ago $119,800$234,700
Python Scala Java Go Apache_Spark Delta_Lake Iceberg Hudi Kafka Azure_EventHubs Pulsar Kubernetes Terraform CI/CD Docker
Hybrid

Member of Technical Staff - Post-Training | Microsoft Careers

Microsoft

US 173 days ago $84,200$165,200
Python Pandas NumPy Spark Ray Apache_Beam Azure Docker Kubernetes CI/CD Git PostgreSQL TensorFlow PyTorch Hugging_Face GitHub_Pods Visual_Studio_Code LoRA DeBerTa Oscar Rho-1 Florence Phi_Models