Senior Software Development Engineer – LLM Inference Framework in Santa Clara, California | Advanced Micro Devices, Inc

Amd

Quick summary

Work type
On-site
Location
Santa Clara, CA
Salary
$204,000–$204,000 / yr
Posted
10 days ago
Closes
Jun 1, 2027

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $186k
This role $204k
$140k most similar roles pay here $241k

This role pays more than 56% of similar roles. Most pay $150,000–$222,000 — the shaded band above. At the midpoint, this role pays about $204k versus about $186k for comparable roles.

Based on 240 similar postings.

Employer

About Amd

AMD (Advanced Micro Devices) is a semiconductor company that develops high-performance processors, graphics cards, and adaptive computing solutions for gaming, data centers, and embedded markets. Industry: Semiconductors

Amd currently has 71 open roles on FindRole.

Listed pay typically runs $178,400–$178,400 across 71 roles with salary data.

Most-posted roles

View all roles at Amd

At a glance

TL;DR · Senior Software Development Engineer – LLM Inference Framework in Santa Clara, California | Advanced Micro Devices, Inc

As a senior member of the LLM inference framework team, you will architect and optimize production-grade single-node and distributed inference runtimes for large language models on AMD GPUs, focusing on tensor parallelism, pipeline parallelism, expert parallelism (MoE), and multi-node inference at scale. Your daily tasks include driving performance and scalability improvements across GPU clusters, implementing efficient multi-node inference pipelines using RCCL and RDMA, and collaborating with kernel and compiler teams to enhance end-to-end performance. Key skills required are hands-on experience with vLLM, SGLang, or similar stacks, expertise in Python and C/C++, and a strong background in AMD GPU architectures and kernel development. This role also involves upstreaming features into open-source frameworks and enabling customer deployments on AMD platforms, making it ideal for systems-minded ML engineers who enjoy working at the intersection of inference engines, distributed systems, and GPU runtime backends.

What you'll do

  • Architect and optimize distributed LLM inference runtimes for single-node and multi-node deployments.
  • Design hybrid execution strategies including tensor parallelism, pipeline parallelism, and expert parallelism.
  • Implement efficient multi-node inference pipelines using RDMA and collective-based execution techniques.
  • Drive performance improvements in throughput, latency, and memory efficiency across GPU clusters.
  • Optimize continuous batching and speculative decoding for high-performance LLM serving.
  • Work with AMD GPU libraries to ensure efficient use of FP8/FP4 GEMM and FlashAttention.
  • Upstream features and performance fixes into open-source inference frameworks like vLLM and SGLang.

What we're looking for

  • Extensive experience with vLLM, SGLang, or similar inference stacks.
  • Proven track record of contributing to upstream open-source projects in distributed inference scaling.
  • Strong background in integrating optimized GPU performance into machine-learning frameworks like PyTorch and TensorFlow.
  • Expertise in Python and C/C++, including debugging and performance tuning for large-scale systems.
  • Experience optimizing large-scale workloads on heterogeneous GPU clusters for efficiency and scalability.
  • Master’s or PhD in Computer Science, Engineering, or a related field.

More like this

Similar roles