Senior AI Software Architect

Microsoft

Redmond, Wa,Us, USA Posted 3 days ago

Role Details

Model Enablement: Port and optimize large-scale AI models (e.g., foundation models, diffusion models, YOLO) to run efficiently on Maia hardware. Integrate models using frameworks such as PyTorch, ONNX, vLLM, and SGLang. Apply techniques like KV cache quantization (e.g., BF16 → FP8), checkpointing, and re-sharding for efficient inference and training. Collaborate on improving inference pipelines, including KV caching in sglang/vllm and performance tuning at the PyTorch level. Work with Triton kernels for basic operations (e.g., FP8 dequantization) and assist in kernel performance analysis. Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. Bachelor's Degree in Computer Science or Engineering. 3+ years of strong hands-on experience with PyTorch and model optimization techniques. Practical knowledge of quantization techniques like PTQ/QAT especially for KV cache quantization. Familiarity with parallelization strategies and distributed training concepts (e.g., sharding, allreduce). 2+ years of experience with AI inference stacks like SGLang/vLLM and performance profiling. Excellent problem-solving and communication skills; ability to work in a collaborative team environment. 3+ years of experience in Triton kernels and CUDA programming (basic understanding is acceptable but willingness to learn is essential). Experience with AI accelerator hardware and embedded systems. 3+ years of prior work on efficient model checkpointing, resharding scripts, and large-scale model deployments for serving at scale.

For more details click Job Post.

About Microsoft

Microsoft Corporation is a global technology leader producing software, hardware, and cloud services including Windows, Office 365, Azure cloud platform, Xbox gaming, and Surface devices. Industry: Software & Cloud Computing