Fellow, AI Hardware Architecture and Software Optimization Engineer (Workload Optimization) in San Jose, California | Advanced Micro Devices, Inc

Amd

Actively hiring
US Posted 76 days ago $256,000$256,000 / year

At a glance

AI generated

TL;DR

Join the AI Software group at AMD as a Fellow and lead the end-to-end software optimization strategy to achieve unparalleled performance for top-tier customers. You will define technical visions and roadmaps, engage with key clients to solve critical performance issues, and collaborate across teams to influence future silicon features based on evolving AI workload trends. With deep expertise in AI frameworks like PyTorch and ROCm, you’ll optimize distributed inference and training at scale using tools such as TorchProfiler and Nsight. This role requires a visionary leader with 15+ years of software development experience, including 5 years in high-level technical leadership, and a strong background in modern model architectures and optimization techniques.

Skills

PyTorch JAX vLLM SGLang ROCm Distributed Systems Multi-node/Multi-GPU Performance Profiling TorchProfiler ROCM Profiler Nsight Transformer Models Attention Mechanisms Quantization Speculative Decoding FlashAttention Cross-functional Collaboration Deep Learning Large Language Models Computer Vision

What you'll do

  • Define and drive the end-to-end software optimization strategy for industry-leading performance.
  • Lead profiling, analysis, and tuning of large-scale AI models to ensure optimal performance on AMD hardware.
  • Engage with top-tier customers to understand unique workload requirements and deliver tailored optimizations.
  • Influence future silicon features by collaborating across hardware architecture and software teams.
  • Develop advanced tools and frameworks for performance estimation and automated reporting in the AI ecosystem.

What we're looking for

  • Over 15 years of software development experience with at least 5 years in high-level technical leadership.
  • Deep expertise in AI frameworks like PyTorch and ROCm software stack.
  • Proven history of optimizing distributed inference and training across multi-node/multi-GPU environments.
  • Mastery of performance profiling tools and hardware-level performance modeling techniques.
  • Strong understanding of modern model architectures and optimization techniques including quantization, speculative decoding.
  • Demonstrated ability to drive cross-functional initiatives in fast-paced, ambiguous environments.

Market check

Salary context

This $256,000–$256,000 range sits above 89% of similar postings on FindRole.

Peer median band

$164,540$240,000

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$164,470$236,600

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Amd

AMD (Advanced Micro Devices) is a semiconductor company that develops high-performance processors, graphics cards, and adaptive computing solutions for gaming, data centers, and embedded markets. Industry: Semiconductors

Amd currently has 80 open roles on FindRole.

Listed pay typically runs $176,400–$176,400 across 80 roles with salary data.

Most-posted roles

View all roles at Amd

More like this

Similar roles