Principal / Senior GPU SW Performance Engineer — Post‑Training in San Jose, California | Advanced Micro Devices, Inc

Amd

Hybrid Actively hiring

San Jose, CA · United States Posted 65 days ago $240,000–$240,000 / year

View original post Log in to save

At a glance

AI generated

TL;DR

As a senior software engineer on the AMD Instinct GPU team, you will drive performance for post-training workloads by optimizing finetuning and reinforcement learning (RL) training solutions across data loaders, kernels, distributed training, and compilers. Your daily tasks include enhancing throughput, memory efficiency, and stability in multi-GPU/multi-node environments while contributing efficient kernels and targeted graph-level optimizations. You will profile, diagnose, and resolve bottlenecks using standard tooling to prevent regressions in continuous integration (CI) systems, ensuring reproducible pipelines and documentation are adopted by internal teams and external developers. Ideal candidates have experience with GPU performance engineering for deep learning on ROCm/HIP or similar platforms, hands-on expertise with SFT, LoRA, and RL-based training at scale, strong PyTorch skills, proficiency in Python and C++, and a track record of turning profiles into fixes and documenting results.

Skills

Python PyTorch C++ ROCm HIP AMD Instinct GPUs Distributed training Multi-GPU/multi-node SFT LoRA RL-based training torch.distributed FSDP ZeRO Distributed systems Collective communication libraries CI/CD

What you'll do

Lead performance optimization for finetuning and RL training on AMD GPUs.
Enhance throughput, memory efficiency, and stability in multi-GPU/multi-node setups.
Develop efficient kernels and graph-level optimizations for deep learning frameworks.
Profile and resolve bottlenecks using standard tooling to prevent CI regressions.
Ship reproducible pipelines and documentation for internal and external use.

What we're looking for

Proven GPU performance engineering for deep learning (ROCm/HIP, Triton, etc.)
Hands-on experience with SFT, LoRA, and RL-based training at scale
Strong PyTorch expertise including torch.distributed, FSDP/ZeRO or equivalent
Proficient in Python and C++; capable of reading/writing kernels
Experience optimizing multi-GPU/multi-node training and communication patterns
Track record of profiling, diagnosing, and resolving performance bottlenecks

Market check

Salary context

This $240,000–$240,000 range sits above 85% of similar postings on FindRole.

Peer median band

$168,000–$241,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$168,500–$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Amd

AMD (Advanced Micro Devices) is a semiconductor company that develops high-performance processors, graphics cards, and adaptive computing solutions for gaming, data centers, and embedded markets. Industry: Semiconductors

Amd currently has 80 open roles on FindRole.

Listed pay typically runs $176,400–$176,400 across 80 roles with salary data.

Most-posted roles

View all roles at Amd

Similar roles

Principal / Senior GPU SW Performance Engineer — Post‑Training in San Jose, California | Advanced Micro Devices, Inc

Amd

US 28 days ago $204,000–$204,000

Python C++ PyTorch ROCm HIP Triton torch.distributed FSDP ZeRO CUDA Docker Kubernetes Git GitHub Jenkins Slack Zoom Markdown Confluence Bash SQL PostgreSQL Prometheus Grafana CI/CD

Fellow GPU Performance Optimization Engineer in San Jose, California | Advanced Micro Devices, Inc

Amd

US 62 days ago $268,000–$268,000

AMD GPU ROCm Nsight Python PyTorch JAX TensorFlow Megatron-LM Torchtitan MaxText NCCL RCCL C++ CUDA HIP Distributed Training Performance Profiling Bottleneck Analysis Compiler Optimization Graph-Level Optimization

Senior Embedded Systems & GPU Platform Engineer in Santa Clara, California | Advanced Micro Devices, Inc

Amd

US 177 days ago $187,760–$187,760

C/C++ Python Bash Linux GPU architectures OpenCL CUDA ROCm BIOS server architecture system-level debug firmware development embedded software development hardware bring-up performance optimization microcontroller fundamentals OS kernel internals

Staff Software Development Engineer- GPU, LLM, AI in Santa Clara, California | Advanced Micro Devices, Inc

Amd

US 74 days ago $145,600–$145,600

C++ CUDA HIP ROCm PyTorch TensorFlow JAX LLMs Transformer architectures Attention mechanisms Mixture-of-Experts Quantization Speculative decoding Agentic AI Reinforcement Learning Supervised Fine-Tuning AMD ROCm Profiler NVIDIA Nsight CUBLAS cuDNN CUTLASS Thrust NCCL rocBLAS hipDNN

Principal Software Developer - GPU AI/HPC kernels in Austin, Texas | Advanced Micro Devices, Inc

Amd

US 91 days ago $212,000–$212,000

C++ Python HIP CUDA OpenCL Terraform Git CI/CD Docker PostgreSQL Prometheus Grafana Kubernetes AWS Azure Google Cloud Platform AI-assisted software development tools

SENIOR SILICON DESIGN ENGINEER: Hardware-Software Co-Design, GPU Compute and AI Compilers & Runtimes in Bellevue, Washington | Advanced Micro Devices, Inc

Amd

US 53 days ago $141,760–$141,760

C C++ Python CUDA HIP Linux DRM/GEM ROCm IREE PyTorch Triton LLVM MLIR GPU HPC AI Compiler Performance Modeling System-Level Programming Kernel-Mode Drivers Memory Management Framework Internals Quantization Techniques Attention Mechanisms Mixture-of-Experts