Software Co-Design AI HPC Systems

Microsoft

Quick summary

Work type
On-site
Location
Salary
$142,800–$274,800 / yr
Posted
135 days ago
Closes
Aug 11, 2026

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $198k
This role $209k
$127k most similar roles pay here $291k

This role pays more than 56% of similar roles. Most pay $159,937–$235,750 — the shaded band above. At the midpoint, this role pays about $209k versus about $198k for comparable roles.

Based on 240 similar postings.

Employer

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing

Microsoft currently has 622 open roles on FindRole.

Listed pay typically runs $119,800–$234,700 across 571 roles with salary data.

Most-posted roles

View all roles at Microsoft

At a glance

TL;DR · Software Co-Design AI HPC Systems

As a Senior Systems Architect in the AI Infrastructure team, you will lead the co-design of advanced AI systems across hardware and software domains, focusing on accelerators, interconnects, memory systems, storage, and distributed frameworks. Your daily tasks include analyzing real workloads to identify bottlenecks and translating insights into actionable requirements for system and hardware improvements. You will develop performance models to guide future hardware roadmaps, optimize parallelism strategies, and collaborate with compiler, kernel, and runtime teams to enhance the performance of current and next-generation accelerators. With a strong background in systems analysis and experience designing large-scale AI clusters, you will influence AI hardware design at both system and silicon levels, mentor senior engineers, and drive technical direction across cross-functional teams. Proficiency in languages such as C++, Python, and Java is essential for this role.

What you'll do

  • Analyze real workloads to identify bottlenecks in compute, communication, and data movement.
  • Develop actionable system requirements based on architectural decision analyses.
  • Optimize parallelism strategies and execution models for large-scale AI systems.
  • Create performance models to predict future system behavior under various conditions.
  • Partner with teams to unlock full performance of current and next-gen accelerators.
  • Influence AI hardware design at system and silicon levels, including microarchitecture.

What we're looking for

  • Extensive experience in coding with languages like C++, Python, and Java.
  • Deep expertise in designing large-scale AI clusters for training and inference.
  • Proven track record of co-designing parallelism strategies and execution models.
  • Strong background in analyzing real workloads to identify system bottlenecks.
  • Experience developing performance models for future hardware generations.
  • Knowledge of accelerator interconnects, communication stacks (NCCL, MPI).
  • Leadership in cross-functional teams for prototyping high-impact ideas.

More like this

Similar roles

Senior Software Architect, Deep Learning and HPC Communications

Nvidia

Remote (Santa Clara, CA) +3 3 days ago $224,000$356,500
C/C++ MPI NCCL NVSHMEM UCX CUDA Linux InfiniBand RoCE PyTorch TensorFlow HPC Networking Simulation Quantitative_Modeling Parallel_Programming Deep_Learning_Pods High_Performance_Networks GPU_Clusters
Remote

Senior AI and ML HPC Cluster Engineer

Nvidia

Remote (Santa Clara, CA) +4 65 days ago $152,000$241,500
Slurm Kubernetes Docker Ansible Python Bash MPI NVIDIA GPUs CUDA NCCL PyTorch TensorFlow Lustre InfiniBand IPoIB RDMA CentOS RHEL Ubuntu Puppet Salt Singularity Podman Shifter Charliecloud
Remote

Senior HPC Performance Engineer, AI for Science at Scale

Nvidia

Santa Clara, CA 131 days ago $184,000$287,500
CUDA Python C++ PyTorch JAX Warp HPC Distributed Learning Atomistic Modeling CI/CD Git Linux NVIDIA DGX Systems GPU Programming Parallel Computing Data Structures Algorithmic Improvements Scientific Machine Learning Digital Biology Computational Chemistry