Fellow GPU Performance Optimization Engineer

Amd

Hybrid

Quick summary

Work type: Hybrid
Location: San Jose, CA
Posted: 92 days ago
Closes: Mar 27, 2027
Nearby: 99+ roles within 25 mi

Market check

Salary context

How this pay compares to similar roles

Similar $206k

$150k most similar roles pay here $269k

This listing doesn't post a salary. Most similar roles pay $177,250–$235,750.

Based on 240 similar postings.

Employer

About Amd

AMD (Advanced Micro Devices) is a semiconductor company that develops high-performance processors, graphics cards, and adaptive computing solutions for gaming, data centers, and embedded markets. Industry: Semiconductors

Amd currently has 56 open roles on FindRole.

Most-posted roles

View all roles at Amd

At a glance

TL;DR · Fellow GPU Performance Optimization Engineer

Role Posting Log in to save

As a Fellow GPU Performance Optimization Engineer at our Models and Applications team, you will lead the optimization of large-scale AI training workloads on AMD GPUs, focusing on single-node and multi-node environments. Your daily tasks include identifying and resolving system bottlenecks across compute, memory, and communication channels to enhance scalability and efficiency through advanced profiling and benchmarking techniques. You will collaborate with hardware, compiler, and framework teams to influence the design of next-generation GPU architecture and software stacks, contributing to open-source projects aimed at improving performance on AMD platforms. Ideal candidates possess deep expertise in GPU architecture, distributed systems, and ML workloads, along with proficiency in Python, C++, CUDA, or HIP, and experience with frameworks like PyTorch and TensorFlow. This role demands a strong understanding of communication libraries such as NCCL/RCCL and the ability to drive impactful optimizations across various layers of the software stack.

Skills

AMD GPU ROCm Nsight Python PyTorch JAX TensorFlow Megatron-LM Torchtitan MaxText NCCL RCCL C++ CUDA HIP Distributed Training Performance Profiling Bottleneck Analysis Compiler Optimization Graph-Level Optimization

What you'll do

Lead optimization of large-scale AI training on AMD GPUs for single-node and multi-node environments.
Identify and resolve system bottlenecks in compute, memory, and communication across GPU platforms.
Optimize distributed training strategies for scalability and efficiency on AMD hardware.
Drive cross-stack optimizations from kernels to ML frameworks for performance improvements.
Develop advanced profiling methodologies to measure and enhance GPU performance.
Influence next-generation GPU architecture and software stack design with hardware teams.

What we're looking for

Deep expertise in GPU architecture and performance optimization.
Proven experience optimizing large-scale distributed training workloads.
Strong understanding of communication libraries and patterns.
Expertise in ML frameworks with a focus on performance tuning.
Proficiency in Python and systems languages like C++/CUDA/HIP.
Experience with compiler stacks and graph-level optimization preferred.
Demonstrated technical leadership and ability to influence cross-functional teams.

Similar roles

Staff GPU Performance Engineer

Samsung Electronics

Remote (San Jose, CA) 13 days ago $167,800–$251,800

Python C++ Vulkan 3D graphics pipelines GPU architecture performance analysis tools ML workload analysis scripting languages automation frameworks CI/CD

Remote

Save

Principal / Senior GPU Software Performance Engineer, Post-Training

Amd

CA 94 days ago

Python PyTorch C++ ROCm HIP AMD Instinct GPUs Distributed training Multi-GPU/multi-node SFT LoRA RL-based training torch.distributed FSDP ZeRO Distributed systems Collective communication libraries CI/CD

Hybrid

Save

GPU Performance Engineer, Platform Architecture

Apple Inc

Cambridge, MA 63 days ago $132,100–$199,000

GPUs CPUs Troubleshooting

Save

GPU Performance Engineer, Platform Architecture

Apple Inc

Cambridge, MA 63 days ago $162,500–$286,400

GPU CPU Troubleshooting

Save

GPU Performance Engineer, Platform Architecture

Apple Inc

Austin, TX 63 days ago

CUDA C++ Python Linux Git Jenkins Docker Kubernetes AWS Google Cloud Platform CI/CD OpenGL OpenCL TensorFlow PyTorch PostgreSQL MongoDB RESTful APIs Swagger

Save

GPU Performance Engineer, Platform Architecture

Apple Inc

San Diego, CA 63 days ago $139,500–$210,100

GPU CPU Troubleshooting

Save