Principal / Senior GPU Software Performance Engineer, Post-Training

Amd

Hybrid

Quick summary

Work type: Hybrid
Location: CA
Posted: 57 days ago
Closes: May 1, 2027
Nearby: 99+ roles within 25 mi

Market check

Salary context

How this pay compares to similar roles

Similar $211k

$155k most similar roles pay here $270k

This listing doesn't post a salary. Most similar roles pay $185,943–$235,750.

Based on 240 similar postings.

Employer

About Amd

AMD (Advanced Micro Devices) is a semiconductor company that develops high-performance processors, graphics cards, and adaptive computing solutions for gaming, data centers, and embedded markets. Industry: Semiconductors

Amd currently has 56 open roles on FindRole.

Most-posted roles

View all roles at Amd

At a glance

TL;DR · Principal / Senior GPU Software Performance Engineer, Post-Training

Role Posting Log in to save

As a Principal/Senior GPU Software Performance Engineer at AMD, you will drive the performance of post-training workloads on Instinct GPUs by optimizing fine-tuning and reinforcement learning training pipelines. Your day-to-day responsibilities include enhancing throughput, memory efficiency, and stability across various components, as well as contributing efficient kernels and targeted optimizations for multi-GPU and multi-node setups. You will leverage AI-assisted workflows to accelerate profiling analysis and regression triage while developing scalable tooling to improve reproducibility and performance reporting. Ideal candidates have extensive experience in GPU performance engineering for deep learning workloads with a strong background in PyTorch, Python, C++, and distributed systems. This role requires proficiency in ROCm/HIP and Triton, along with hands-on expertise in SFT, LoRA, and RL-based training at scale.

Skills

Python C++ PyTorch ROCm HIP Triton torch.distributed FSDP ZeRO CUDA Docker Kubernetes Git GitHub Jenkins Slack Zoom Markdown Confluence Bash SQL PostgreSQL Prometheus Grafana CI/CD

What you'll do

Lead performance optimization for fine-tuning and RL training on AMD GPUs.
Enhance throughput and memory efficiency in multi-GPU and multi-node setups.
Develop efficient kernels and targeted optimizations for maximum impact.
Profile, diagnose, and resolve bottlenecks using standard tooling and AI workflows.
Build scalable automation tools to improve reproducibility and performance reporting.
Collaborate with cross-functional teams to implement durable performance improvements.

What we're looking for

Proven GPU performance engineering for deep learning workloads.
Hands-on experience with SFT, LoRA, and RL-based training at scale.
Strong PyTorch expertise including distributed training approaches.
Proficient in Python and C++; capable of writing kernels.
Experience with distributed systems and collective communication libraries.
Track record of turning profiles into fixes and documenting results.

Similar roles

Principal / Senior GPU Software Performance Engineer, Post-Training

Amd

CA 94 days ago

Python PyTorch C++ ROCm HIP AMD Instinct GPUs Distributed training Multi-GPU/multi-node SFT LoRA RL-based training torch.distributed FSDP ZeRO Distributed systems Collective communication libraries CI/CD

Hybrid

Save

Staff GPU Performance Engineer

Samsung Electronics

Remote (San Jose, CA) 13 days ago $167,800–$251,800

Python C++ Vulkan 3D graphics pipelines GPU architecture performance analysis tools ML workload analysis scripting languages automation frameworks CI/CD

Remote

Save

Fellow GPU Performance Optimization Engineer

Amd

San Jose, CA 92 days ago

AMD GPU ROCm Nsight Python PyTorch JAX TensorFlow Megatron-LM Torchtitan MaxText NCCL RCCL C++ CUDA HIP Distributed Training Performance Profiling Bottleneck Analysis Compiler Optimization Graph-Level Optimization

Hybrid

Save

Senior Systems Software Engineer, GPU Performance at Scale

Nvidia

Remote (Santa Clara, CA) 6 days ago $184,000–$287,500

CUDA Slurm Python C C++ Bash Docker Linux Container Technology Virtualization HPC Environments Cloud Platform Solutions CI/CD

Remote

Save

Senior Deep Learning Tools Engineer, CUDA Tile

Nvidia

Remote (Santa Clara, CA) +2 53 days ago $152,000–$241,500

Python C++ CI/CD PyTorch TensorFlow JAX TensorRT LLVM MLIR CUDA Docker Kubernetes Prometheus Grafana PostgreSQL Git GitHub Slack

Remote

Save

Senior Staff Engineer, Post-Silicon GPU Power & Performance

Samsung Electronics

Remote 80 days ago $180,200–$270,400

Python C/C++ Linux Android SQL GPU CPU SoC Performance_analysis_tools Post-silicon_validation System_level_architecture Kernel_level_debugging Shell_scripting Databases Emulation_platforms Silicon_validation Complex_software_workloads CI/CD

Remote

Save