Principal / Senior GPU SW Performance Engineer — Post‑Training in San Jose, California | Advanced Micro Devices, Inc

Amd

Hybrid Actively hiring Verified listing
San Jose, CA · Other U.S.-based locations Posted 28 days ago $204,000$204,000 / year

At a glance

AI generated

TL;DR

As a Principal/Senior GPU Software Performance Engineer at AMD, you will drive the performance of post-training workloads on Instinct GPUs by optimizing fine-tuning and reinforcement learning training pipelines. Your day-to-day responsibilities include enhancing throughput, memory efficiency, and stability across various components, as well as contributing efficient kernels and targeted optimizations for multi-GPU and multi-node setups. You will leverage AI-assisted workflows to accelerate profiling analysis and regression triage while developing scalable tooling to improve reproducibility and performance reporting. Ideal candidates have extensive experience in GPU performance engineering for deep learning workloads with a strong background in PyTorch, Python, C++, and distributed systems. This role requires proficiency in ROCm/HIP and Triton, along with hands-on expertise in SFT, LoRA, and RL-based training at scale.

Skills

Python C++ PyTorch ROCm HIP Triton torch.distributed FSDP ZeRO CUDA Docker Kubernetes Git GitHub Jenkins Slack Zoom Markdown Confluence Bash SQL PostgreSQL Prometheus Grafana CI/CD

What you'll do

  • Lead performance optimization for fine-tuning and RL training on AMD GPUs.
  • Enhance throughput and memory efficiency in multi-GPU and multi-node setups.
  • Develop efficient kernels and targeted optimizations for maximum impact.
  • Profile, diagnose, and resolve bottlenecks using standard tooling and AI workflows.
  • Build scalable automation tools to improve reproducibility and performance reporting.
  • Collaborate with cross-functional teams to implement durable performance improvements.

What we're looking for

  • Proven GPU performance engineering for deep learning workloads.
  • Hands-on experience with SFT, LoRA, and RL-based training at scale.
  • Strong PyTorch expertise including distributed training approaches.
  • Proficient in Python and C++; capable of writing kernels.
  • Experience with distributed systems and collective communication libraries.
  • Track record of turning profiles into fixes and documenting results.

Market check

Salary context

This $204,000–$204,000 range sits above 51% of similar postings on FindRole.

Peer median band

$168,000$241,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$168,500$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Amd

AMD (Advanced Micro Devices) is a semiconductor company that develops high-performance processors, graphics cards, and adaptive computing solutions for gaming, data centers, and embedded markets. Industry: Semiconductors

Amd currently has 80 open roles on FindRole.

Listed pay typically runs $176,400–$176,400 across 80 roles with salary data.

Most-posted roles

View all roles at Amd

More like this

Similar roles