Software Engineering IC5 | Microsoft Careers

Microsoft

Actively hiring Posted this week
WA Posted 3 days ago $142,800$274,800 / year

At a glance

AI generated

TL;DR

As a Principal Engineer on the CoreAI Infrastructure team at Azure, you will design and build GPU and CPU accelerated compute platforms for AI workloads across bare metal, VMs, and containerized environments. Your day-to-day involves developing end-to-end observability systems, optimizing resource governance with Kubernetes-based tools like AKS and DRA, and enhancing virtualization stacks to support secure and confidential computing scenarios. You will also partner with networking teams to ensure high-performance interconnects for distributed workloads and drive technical direction through design reviews and mentorship. The role requires expertise in C++, Python, Java, JavaScript, and hands-on experience with Azure Kubernetes Service (AKS), virtualization platforms like Kubernetes, and observability tools such as Prometheus and OpenTelemetry. This position addresses the critical need to deliver secure, reliable, and efficient infrastructure for global-scale AI systems at Microsoft Azure.

Skills

Azure Kubernetes Service (AKS) Kubernetes Prometheus OpenTelemetry Grafana Python C C++ C# Java JavaScript Docker RDMA InfiniBand NCCL CUDA Terraform CI/CD

What you'll do

  • Design and build GPU/CPU accelerated infrastructure for diverse AI workloads.
  • Develop observability systems for efficient management of GPU/CPU resources.
  • Build advanced orchestration scenarios using Kubernetes ecosystem capabilities.
  • Optimize performance and utilization across large-scale GPU/CPU fleets.
  • Partner with networking teams to enable high-performance interconnects.
  • Drive end-to-end platform features from design through production deployment.
  • Influence technical direction by leading design reviews and providing leadership.

What we're looking for

  • 6+ years of technical engineering experience in cloud infrastructure design and operation.
  • Proven ability to work with Azure Kubernetes Service (AKS) for largescale production infrastructure.
  • Expertise in virtualization and container platforms like Kubernetes and Docker.
  • Strong problem-solving skills and hands-on experience debugging complex systems issues.
  • Demonstrated leadership in mentoring engineers and driving technical alignment across teams.
  • Advanced knowledge of distributed observability technologies such as Prometheus and Grafana.
  • Experience with large-scale training infrastructure, including CUDA libraries and tools.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 534 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 488 roles with salary data.

Most-posted roles

View all roles at Microsoft