Principal AI Performance Engineer in San Jose, California | Advanced Micro Devices, Inc

Amd

Hybrid

Quick summary

Work type
Hybrid
Location
San Jose, CA
Salary
$240,000–$240,000 / yr
Posted
92 days ago
Closes
Mar 11, 2027

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $212k
This role $240k
$168k most similar roles pay here $255k

This role pays more than 66% of similar roles. Most pay $176,637–$246,425 — the shaded band above. At the midpoint, this role pays about $240k versus about $212k for comparable roles.

Based on 240 similar postings.

Employer

About Amd

AMD (Advanced Micro Devices) is a semiconductor company that develops high-performance processors, graphics cards, and adaptive computing solutions for gaming, data centers, and embedded markets. Industry: Semiconductors

Amd currently has 71 open roles on FindRole.

Listed pay typically runs $178,400–$178,400 across 71 roles with salary data.

Most-posted roles

View all roles at Amd

At a glance

TL;DR · Principal AI Performance Engineer in San Jose, California | Advanced Micro Devices, Inc

AMD seeks a Principal AI Performance Engineer to lead a small technical team in optimizing AI inference performance on AMD GPUs for strategic customer engagements. This role involves end-to-end stack optimization of leading models and configurations, from profiling and diagnosing kernel-level bottlenecks to presenting optimizations to senior stakeholders. The ideal candidate has deep expertise in GPU computing, AI serving frameworks like vLLM and SGLang, and proficiency with Python and C++. They must excel at customer-facing technical leadership, leveraging AI agents daily to enhance workflows while developing reusable optimization methodologies. This position demands a performance-obsessed mindset, tackling complex challenges across multi-node distributed systems and leaving measurable impacts on AMD’s competitive edge in the AI market.

What you'll do

  • Drive end-to-end performance optimization on AMD GPUs for leading AI models.
  • Profile and resolve complex cross-stack bottlenecks in GPU kernels and frameworks.
  • Diagnose kernel-level issues using profiling tools to enhance model performance.
  • Lead customer engagements by presenting technical findings and optimizations.
  • Develop custom kernels within serving frameworks to improve dispatch efficiency.
  • Optimize multi-node distributed inference for better communication-compute overlap.
  • Define and refine performance optimization methodologies for the broader team.

What we're looking for

  • 7+ years of software development experience in GPU computing, AI systems, or high-performance computing.
  • Deep hands-on experience with AI serving frameworks and their internals, including vLLM, SGLang, TensorRT-LLM.
  • Strong background in end-to-end workload profiling and bottleneck diagnosis from user request to GPU kernel.
  • Expertise in GPU kernel performance characteristics such as occupancy, memory coalescing, cache utilization, and instruction-level bottlenecks.
  • Experience with custom kernel development or integration using HIP, CUDA, Triton, CK, or similar technologies.
  • Understanding of multi-GPU and multi-node distributed systems, including scale-up and scale-out topologies, RDMA, and communication-compute overlap.
  • Fluent in AI-assisted development, leveraging AI agents and tools daily to accelerate workflows.

More like this

Similar roles