Fellow GPU Performance Optimization Engineer

Amd

Hybrid

Quick summary

Work type
Hybrid
Location
San Jose, CA
Posted
92 days ago
Closes
Mar 27, 2027

Market check

Salary context

How this pay compares to similar roles

Similar $206k
$150k most similar roles pay here $269k

This listing doesn't post a salary. Most similar roles pay $177,250–$235,750.

Based on 240 similar postings.

Employer

About Amd

AMD (Advanced Micro Devices) is a semiconductor company that develops high-performance processors, graphics cards, and adaptive computing solutions for gaming, data centers, and embedded markets. Industry: Semiconductors

Amd currently has 56 open roles on FindRole.

Most-posted roles

View all roles at Amd

At a glance

TL;DR · Fellow GPU Performance Optimization Engineer

As a Fellow GPU Performance Optimization Engineer at our Models and Applications team, you will lead the optimization of large-scale AI training workloads on AMD GPUs, focusing on single-node and multi-node environments. Your daily tasks include identifying and resolving system bottlenecks across compute, memory, and communication channels to enhance scalability and efficiency through advanced profiling and benchmarking techniques. You will collaborate with hardware, compiler, and framework teams to influence the design of next-generation GPU architecture and software stacks, contributing to open-source projects aimed at improving performance on AMD platforms. Ideal candidates possess deep expertise in GPU architecture, distributed systems, and ML workloads, along with proficiency in Python, C++, CUDA, or HIP, and experience with frameworks like PyTorch and TensorFlow. This role demands a strong understanding of communication libraries such as NCCL/RCCL and the ability to drive impactful optimizations across various layers of the software stack.

What you'll do

  • Lead optimization of large-scale AI training on AMD GPUs for single-node and multi-node environments.
  • Identify and resolve system bottlenecks in compute, memory, and communication across GPU platforms.
  • Optimize distributed training strategies for scalability and efficiency on AMD hardware.
  • Drive cross-stack optimizations from kernels to ML frameworks for performance improvements.
  • Develop advanced profiling methodologies to measure and enhance GPU performance.
  • Influence next-generation GPU architecture and software stack design with hardware teams.

What we're looking for

  • Deep expertise in GPU architecture and performance optimization.
  • Proven experience optimizing large-scale distributed training workloads.
  • Strong understanding of communication libraries and patterns.
  • Expertise in ML frameworks with a focus on performance tuning.
  • Proficiency in Python and systems languages like C++/CUDA/HIP.
  • Experience with compiler stacks and graph-level optimization preferred.
  • Demonstrated technical leadership and ability to influence cross-functional teams.

More like this

Similar roles

Staff GPU Performance Engineer

Samsung Electronics

Remote (San Jose, CA) 13 days ago $167,800$251,800
Python C++ Vulkan 3D graphics pipelines GPU architecture performance analysis tools ML workload analysis scripting languages automation frameworks CI/CD
Remote