ML Framework (MetalLM) Engineer, Graphics, Game and ML

Apple Inc

Quick summary

Work type
On-site
Location
Cupertino, CA
Salary
$147,400–$272,100 / yr
Posted
56 days ago

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $220k
This role $210k
$132k most similar roles pay here $287k

This role pays more than 54% of similar roles. Most pay $190,150–$249,750 — the shaded band above. At the midpoint, this role pays about $210k versus about $220k for comparable roles.

Based on 239 similar postings.

Employer

About Apple Inc

Apple Inc. is a multinational technology company known for designing and manufacturing consumer electronics, software, and online services, including the iPhone, Mac, iPad, and App Store. Industry: Consumer Electronics & Software

Apple Inc currently has 1723 open roles on FindRole.

Listed pay typically runs $162,500–$272,100 across 1398 roles with salary data.

Most-posted roles

View all roles at Apple Inc

At a glance

TL;DR · ML Framework (MetalLM) Engineer, Graphics, Game and ML

As an ML Framework Engineer at Apple’s Server ML Frameworks team in GPU, Graphics, and Machine Learning, you will work on cutting-edge projects to optimize ML inference frameworks for high-throughput GPU execution across diverse server hardware. Your day-to-day responsibilities include developing kernel and compiler optimizations, applying advanced model optimization techniques, and collaborating with cross-functional teams to ensure software performance aligns with hardware capabilities. You will focus on distributed compute strategies like data, tensor, pipeline, and expert parallelism, as well as analyzing performance metrics such as latency and memory footprint. The ideal candidate has a strong background in GPU programming using Metal or CUDA, experience with distributed training techniques, and familiarity with graph compilers like CuTE or LLVM. This role offers the chance to influence next-generation GPU architecture design within Apple’s robust ecosystem of custom-built server hardware for secure and powerful private cloud computing solutions.

What you'll do

  • Optimize ML inference framework code for efficient and scalable distributed compute strategies.
  • Develop kernel and compiler level optimizations to ensure peak performance across server hardware families.
  • Apply advanced techniques like quantization, compression, and speculation to maximize throughput and minimize latency.
  • Analyze and enhance performance metrics including end-to-end latency, TTFT, TBOT, memory footprint, and compute efficiency.
  • Collaborate with hardware, compiler, and systems teams to align software performance with hardware capabilities.

What we're looking for

  • 3+ years of experience in C/C++/ObjC programming and problem-solving.
  • Proficient in GPU kernel development and optimizations using Metal, CUDA, etc.
  • Experience with distributed training or inference techniques.
  • Strong background in system level programming and computer architecture.
  • Familiarity with graph compilers like CuTE, CuTile, Triton, OpenXLA, or LLVM.
  • Understanding of LLM and diffusion-based model architectures.

More like this

Similar roles