| Microsoft Careers

Microsoft

Quick summary

Work type: On-site
Location: Redmond, WA
Salary: $139,900–$274,800 / yr
Posted: 85 days ago
Closes: Sep 19, 2026
Nearby: 99+ roles within 25 mi

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $193k

This role $207k

$124k most similar roles pay here $291k

This role pays more than 56% of similar roles. Most pay $177,250–$208,800 — the shaded band above. At the midpoint, this role pays about $207k versus about $193k for comparable roles.

Based on 239 similar postings.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 1577 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 1405 roles with salary data.

Most-posted roles

View all roles at Microsoft

At a glance

TL;DR · | Microsoft Careers

Apply Now Log in to save

Join Microsoft’s AI Core team as a senior systems engineer focusing on high-performance runtime systems for large-scale LLM inferencing, with deep C++ expertise. You will design and implement microservices and runtime components to optimize AI inferencing systems for latency, throughput, cost, and reliability at scale. Responsibilities include debugging complex production issues, integrating model inference pipelines into scalable infrastructure, and driving innovations in real-time and batch inferencing efficiency. The role requires 6+ years of experience in systems programming with C++, proven track record in building and operating scalable cloud services, strong debugging skills, and hands-on experience with distributed systems, Kubernetes, and CUDA for large-scale LLM infrastructures. Preferred candidates have additional experience optimizing AI model inference stacks and working on Azure OpenAI or similar platforms.

Skills

C++ Kubernetes CUDA Docker Azure Linux Performance Profiling Tools Debugging Tools CI/CD Multimodal Inferencing LLM Inferencing Infrastructure Service Reliability Engineering OpenAI

What you'll do

Design and implement high performance microservices and runtime components in C++.
Optimize AI inferencing systems for latency, throughput, cost, and reliability at scale.
Debug and resolve complex production issues related to performance, scaling, and service reliability.
Contribute to state-of-the-art multimodal inferencing systems supporting text, speech, and vision workloads.
Drive systems level innovations for real-time and batch inferencing efficiency.

What we're looking for

6+ years of systems programming experience with strong C++ expertise.
Proven track record in building, deploying, and operating scalable cloud services.
Expertise in debugging complex issues using performance profiling tools.
Hands-on experience with distributed systems, Kubernetes, and containerized workloads.
Experience optimizing large-scale LLM inferencing infrastructure, including CUDA.

Save