Software Engineering IC5 | Microsoft Careers
At a glance
AI generatedTL;DR
As a Principal Engineer on the CoreAI Infrastructure team at Azure, you will design and build GPU and CPU accelerated compute platforms for AI workloads across bare metal, VMs, and containerized environments. Your day-to-day involves developing end-to-end observability systems, optimizing resource governance with Kubernetes-based tools like AKS and DRA, and enhancing virtualization stacks to support secure and confidential computing scenarios. You will also partner with networking teams to ensure high-performance interconnects for distributed workloads and drive technical direction through design reviews and mentorship. The role requires expertise in C++, Python, Java, JavaScript, and hands-on experience with Azure Kubernetes Service (AKS), virtualization platforms like Kubernetes, and observability tools such as Prometheus and OpenTelemetry. This position addresses the critical need to deliver secure, reliable, and efficient infrastructure for global-scale AI systems at Microsoft Azure.
Skills
What you'll do
- Design and build GPU/CPU accelerated infrastructure for diverse AI workloads.
- Develop observability systems for efficient management of GPU/CPU resources.
- Build advanced orchestration scenarios using Kubernetes ecosystem capabilities.
- Optimize performance and utilization across large-scale GPU/CPU fleets.
- Partner with networking teams to enable high-performance interconnects.
- Drive end-to-end platform features from design through production deployment.
- Influence technical direction by leading design reviews and providing leadership.
What we're looking for
- 6+ years of technical engineering experience in cloud infrastructure design and operation.
- Proven ability to work with Azure Kubernetes Service (AKS) for largescale production infrastructure.
- Expertise in virtualization and container platforms like Kubernetes and Docker.
- Strong problem-solving skills and hands-on experience debugging complex systems issues.
- Demonstrated leadership in mentoring engineers and driving technical alignment across teams.
- Advanced knowledge of distributed observability technologies such as Prometheus and Grafana.
- Experience with large-scale training infrastructure, including CUDA libraries and tools.
Employer
About Microsoft
Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing
Microsoft currently has 534 open roles on FindRole.
Listed pay typically runs $119,800–$234,700 across 488 roles with salary data.
Most-posted roles
- | Microsoft Careers 121
- Principal Software Engineer | Microsoft Careers 19
- Senior Software Engineer | Microsoft Careers 18
- Software Engineer II | Microsoft Careers 10
- Principal Applied Scientist | Microsoft Careers 5