Principal Software Engineer, CoreAI | Microsoft Careers

Microsoft

Quick summary

Work type: On-site
Location: WA
Salary: $142,800–$274,800 / yr
Posted: 51 days ago
Closes: Oct 11, 2026
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $194k

This role $209k

$124k most similar roles pay here $291k

This role pays more than 74% of similar roles. Most pay $177,250–$211,200 — the shaded band above. At the midpoint, this role pays about $209k versus about $194k for comparable roles.

Based on 240 similar postings.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 598 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 547 roles with salary data.

Most-posted roles

View all roles at Microsoft

At a glance

TL;DR · Principal Software Engineer, CoreAI | Microsoft Careers

Apply Now Log in to save

As a Principal Engineer on the AI Core Infrastructure team within Microsoft’s CoreAI Organization, you will lead the architectural design and strategic planning for monitoring, troubleshooting, and scaling AI training workloads at supercomputer scale. Your daily tasks include setting the roadmap for training infrastructure, developing backend services that power AI workloads, and providing deep insights to optimize large-scale systems. You’ll collaborate closely with internal research teams and leverage production telemetry data to enhance future infrastructure design. Additionally, you will mentor engineering teams and champion customer-focused system designs. The role requires expertise in distributed observability technologies like Prometheus and OpenTelemetry, as well as hands-on experience with ML systems, NCCL, CUDA libraries, Docker, Kubernetes, and scalable architectures.

Skills

Python Kubernetes Docker Prometheus OpenTelemetry Grafana NCCL CUDA C C++ C# Java JavaScript Terraform CI/CD

What you'll do

Set the roadmap and drive execution for training infrastructure at supercomputer scale.
Design and develop backend services to power large-scale AI workloads.
Provide deep insights to help customers troubleshoot and optimize their AI workloads.
Influence next-generation infrastructure design using production telemetry data.
Mentor engineering teams and champion customer-focused system design approaches.

What we're looking for

6+ years experience with coding in C, C++, C#, Java, JavaScript, Python or equivalent.
Expertise in designing and scaling telemetry pipelines for high-throughput production systems.
Advanced hands-on experience with production ML systems and large-scale training infrastructure.
Strong focus on reliability, scalability, and performance in distributed systems.
Understanding of Docker, Kubernetes, scalable architectures, and automation for production systems.
Excellent analytical skills and ability to design clear, scalable solutions from ambiguous requirements.

Save