Senior AI Software Architect

Microsoft

Quick summary

Work type
On-site
Location
Salary
$119,800–$234,700 / yr
Posted
165 days ago
Closes
Jul 12, 2026

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $202k
This role $177k
$103k most similar roles pay here $274k

This role pays less than 67% of similar roles. Most pay $167,249–$237,350 — the shaded band above. At the midpoint, this role pays about $177k versus about $202k for comparable roles.

Based on 240 similar postings.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 622 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 559 roles with salary data.

Most-posted roles

View all roles at Microsoft

At a glance

TL;DR · Senior AI Software Architect

As a Senior AI Software Architect in Microsoft’s AHSI team, you will play a pivotal role in enabling and optimizing large-scale AI models for Maia accelerators. Your day-to-day responsibilities include porting and integrating models using frameworks like PyTorch, ONNX, vLLM, and SGLang, while applying advanced techniques such as KV cache quantization and parallelism strategies to enhance performance. You will collaborate closely with hardware architects and kernel developers to co-design solutions that address complex business challenges at scale, ensuring efficient model deployment and inference across diverse interconnects like NVLink and PCIe. This role requires expertise in PyTorch, model optimization, and distributed training concepts, along with a strong background in AI inference stacks and Triton kernels.

What you'll do

  • Port and optimize large-scale AI models to run efficiently on Maia hardware.
  • Apply quantization techniques like BF16 → FP8 for efficient inference and training.
  • Experiment with parallelism strategies (TP, PP) to analyze performance impacts across interconnects.
  • Collaborate on improving inference pipelines including KV caching in sglang/vllm.
  • Assist in kernel performance analysis and work with Triton kernels for basic operations.

What we're looking for

  • Strong hands-on experience with PyTorch and model optimization techniques.
  • Practical knowledge of quantization techniques like PTQ/QAT for KV cache quantization.
  • Familiarity with parallelization strategies and distributed training concepts such as sharding, allreduce.
  • Experience with AI inference stacks like SGLang/vLLM and performance profiling.
  • Excellent problem-solving and communication skills; ability to work in a collaborative team environment.
  • 3+ years of experience in Triton kernels and CUDA programming.
  • Prior work on efficient model checkpointing, resharding scripts, and large-scale model deployments for serving at scale.

More like this

Similar roles

Principal AI Software Architect

Microsoft

72 days ago $142,800$274,800
PyTorch CUDA Triton C C++ AWS Azure Kubernetes Docker CI/CD Prometheus Grafana Git Python PostgreSQL

Senior AI Hardware Architect

Microsoft

15 days ago $119,800$234,700
Python C/C++ GPU AI_accelerators PyTorch vLLM SGLang Distributed_training Quantization Sparsity Sharding_strategies KV-cache_management Flash_Attention Communication_computation_overlap Performance_profiling Benchmarking Silicon_correlation Architectural_simulation Workload_characterization Data_analysis Visualization Terraform AWS Kubernetes

Senior AI Architect

IBM

Remote 10 days ago
Python TensorFlow PyTorch Kubernetes Docker AWS CI/CD Git Scikit-learn PostgreSQL MLOps
Remote

Senior AI Architect

IBM

Remote 10 days ago
Python TensorFlow PyTorch Kubernetes Docker AWS Azure CI/CD Git PostgreSQL MongoDB
Remote

Senior AI Architect

IBM

Remote 10 days ago
Python TensorFlow PyTorch Kubernetes Docker AWS CI/CD Git Scikit-learn PostgreSQL MLOps
Remote

Senior AI Architect

IBM

Remote 10 days ago
Python TensorFlow PyTorch Kubernetes Docker AWS Azure CI/CD Git Scikit-learn PostgreSQL MongoDB
Remote