Senior Software Engineer | Microsoft Careers

Microsoft

Actively hiring
US Posted 124 days ago $119,800$234,700 / year

At a glance

AI generated

TL;DR

As a Senior Software Engineer on the AI Customer Experience (AICE) engineering team within Microsoft Azure’s High Performance Computing & AI Engineering division, you will play a critical role in managing and optimizing the flagship supercomputers used by top-tier AI customers. Your day-to-day responsibilities include designing and developing monitoring capabilities for large-scale infrastructure, diagnosing complex issues across hardware and software stacks, and creating data pipelines to process telemetry and logs for actionable alerts. You will contribute to improving key metrics like Mean Time to Interrupt and Nodes in Service, manage operations during critical incidents, and implement systemic solutions to enhance performance and reliability. The role requires expertise in languages such as C++, Java, or Python, along with experience in GPU-based systems and large-scale data pipelines using tools like Prometheus and Grafana. This position offers the opportunity to directly impact customer satisfaction by driving consistency and efficiency in monitoring and operations at scale.

Skills

Python C++ Java JavaScript Prometheus Grafana Azure Kubernetes Docker CI/CD PostgreSQL Redis Git Jenkins Ansible Terraform InfiniBand HPC AI

What you'll do

  • Develop and enhance monitoring capabilities for supercomputers to improve key performance metrics.
  • Implement systemic solutions to address complex issues affecting the functionality of supercomputers.
  • Create comprehensive observability and monitoring features by improving troubleshooting guides and telemetry.
  • Write incident postmortems and present insights that lead to changes reducing future incidents.
  • Independently seek new knowledge and adapt to emerging trends to enhance supercomputer performance.

What we're looking for

  • Extensive technical engineering experience (6+ years) in coding languages like C++, Java, Python.
  • Deep expertise in diagnosing and troubleshooting GPU-based systems (e.g., H100, A100).
  • Proficiency with large-scale data pipelines using tools such as Prometheus and Grafana.
  • Ability to independently improve troubleshooting guides and add comprehensive observability capabilities.
  • Experience managing operations of supercomputers by quickly responding to mitigate issues.
  • Strong background in implementing systemic solutions for complex issues impacting supercomputer performance.

Market check

Salary context

This $119,800–$234,700 range sits above 73% of similar postings on FindRole.

Peer median band

$119,800$234,000

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$143,930$188,106

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 534 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 488 roles with salary data.

Most-posted roles

View all roles at Microsoft

More like this

Similar roles

Senior Software Engineer | Microsoft Careers

Microsoft

US 124 days ago $119,800$234,700
Python C++ C# Java JavaScript Azure Docker Kubernetes Terraform CI/CD Git Linux Windows PostgreSQL MySQL Redis HPC Machine_Learning Virtualization Distributed_Systems GPU_Accelerators Networking Performance_Analysis

Senior Software Engineer | Microsoft Careers

Microsoft

US 115 days ago $119,800$234,700
Azure Kubernetes Docker CI/CD Python C++ Go Java SQL PostgreSQL Redis Terraform Ansible Git Jenkins Prometheus Grafana OpenAPI RESTful APIs Swagger

Senior Software Engineer | Microsoft Careers

Microsoft

Redmond, WA 34 days ago $119,800$234,700
Microsoft Azure CI/CD Telemetry Debugging Networking Operating Systems Authentication Docker Kubernetes Python Go SQL PostgreSQL Redis MongoDB Git GitHub Jenkins Prometheus Grafana
Hybrid

| Microsoft Careers

Microsoft

Redmond, WA 52 days ago $119,800$234,700
Azure Python Java Scala Spark Hadoop HDFS Kafka Flink Docker Kubernetes CI/CD PostgreSQL Redis Elasticsearch Prometheus Grafana Git Jenkins
Hybrid

Senior Software Engineer | Microsoft Careers

Microsoft

Washington 122 days ago $119,800$234,700
C++ JavaScript Python Git CI/CD Docker Kubernetes Terraform AWS Azure PostgreSQL SQLite Chrome Chromium W3C REST GraphQL HTML5 CSS3 WebAssembly WebGL
Hybrid

Senior Software Engineer | Microsoft Careers

Microsoft

Redmond, WA 24 days ago $119,800$234,700
Azure React Web API Python Go Rust Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Big Data LLMs Agentic Workflows