| Microsoft Careers

Microsoft

Actively hiring Verified listing
US Posted 138 days ago $119,800$234,700 / year

At a glance

AI generated

TL;DR

As a Senior AI Software Architect in the SPARC team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, you will play a pivotal role in enabling and optimizing large-scale AI models on Maia accelerators. Your daily tasks include porting and optimizing models using frameworks like PyTorch, ONNX, vLLM, and SGLang, applying quantization techniques such as BF16 to FP8 conversion for efficient inference, and experimenting with parallelism strategies across different interconnects. You will also collaborate closely with hardware architects and kernel developers to co-design solutions, ensuring models run efficiently on Maia hardware while working on improving the inference stack through performance tuning at the PyTorch level and assisting in kernel performance analysis. This role requires expertise in AI inference stacks, Triton kernels, CUDA programming, and a strong background in model optimization techniques, making it ideal for someone with a growth mindset and a passion for innovation in cloud infrastructure.

Skills

PyTorch ONNX vLLM SGLang NVLink PCIe TridentOmniscienTriton CUDA BF16 FP8 KV cache quantization Checkpointing Resharding TP PP Parallelism strategies Distributed training concepts Sharding Allreduce Performance profiling

What you'll do

  • Port and optimize large-scale AI models to run efficiently on Maia hardware.
  • Apply quantization techniques like BF16 → FP8 for efficient inference and training.
  • Experiment with parallelism strategies (TP, PP) and analyze performance impacts across interconnects.
  • Collaborate on improving inference pipelines including KV caching in sglang/vllm.
  • Assist in kernel performance analysis and work with Triton kernels for basic operations.

What we're looking for

  • 3+ years of hands-on experience with PyTorch and model optimization techniques.
  • Practical knowledge of quantization techniques like PTQ/QAT, especially for KV cache quantization.
  • Familiarity with parallelization strategies and distributed training concepts such as sharding and allreduce.
  • 2+ years of experience with AI inference stacks like SGLang/vLLM and performance profiling.
  • Excellent problem-solving and communication skills; ability to work in a collaborative team environment.
  • Experience with Triton kernels and CUDA programming, willingness to learn essential.
  • 3+ years of prior work on efficient model checkpointing, resharding scripts, and large-scale model deployments for serving at scale.

Market check

Salary context

This $119,800–$234,700 range sits above 41% of similar postings on FindRole.

Peer median band

$148,250$241,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$177,250$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 534 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 488 roles with salary data.

Most-posted roles

View all roles at Microsoft

More like this

Similar roles

| Microsoft Careers

Microsoft

US 13 days ago $119,800$234,700
Python FastAPI Azure ADLS Gen2 Synapse Azure Data Explorer Airflow Terraform Bicep ARM CI/CD Prometheus Grafana Kubernetes PostgreSQL

Senior Solution Architect - AI Business Solutions | Microsoft Careers

Microsoft

VA 2 days ago $124,500$214,600
Dynamics_365 Power_Platform Azure AI Copilot Copilot_Studio CI/CD Microsoft_Cloud Low_code Cloud_Adoption_Framework Well_Architected_Framework Omnichannel_Customer_Service Data_Integration Governance_Models Enterprise_Architecture_Frameworks Security_and_Identity_Architectures Application_Lifecycle_Governance Kubernetes Terraform

Senior Software Architect, AI Systems and Networking

Nvidia

Remote (Santa Clara, CA) 13 days ago $224,000$356,500
C C++ Rust RDMA GPUDirect NVLink InfiniBand RoCE GPU DPU NIC switch vLLM SGLang TensorRT-LLM NVMe-oF GPUDirect Storage S3 Reinforcement Learning ML inference frameworks
Remote

Senior Solution Architect, AI Infrastructure

Nvidia

Remote (Us, Dc, Remote, US) 21 days ago $184,000$287,500
NVIDIA_GPUs NVIDIA_Networking InfiniBand Ethernet NCCL DCGM UFM Mission_Control Base_Command_Manager AI_solutions High_Performance_Computing Networking Python CI/CD Git AWS Azure Grafana Prometheus
Remote

Senior Software Engineer - AI Core Engineering

The Walt Disney Company

Remote (Usa - Ca - 1200 Grand Central Ave, US) 96 days ago $141,900$190,300
Python LLM APIs AWS Bedrock Azure AI Foundry LangChain LangGraph APIs SDKs OpenAI Anthropic Claude Observability Tracing Latency and cost dashboards Drift detection Multi-agent orchestration Synthetic data Enterprise governance Security Compliance Audit Policy enforcement
Remote