Staff Compiler Engineer - PyTorch + Kernel DSLPLATE

Samsung Semiconductor

Quick summary

Work type
On-site
Location
San Jose, CA
Salary
$163,000–$253,000 / yr
Posted
today

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $210k
This role $208k
$150k most similar roles pay here $264k

This role pays less than 51% of similar roles. Most pay $183,726–$237,237 — the shaded band above. At the midpoint, this role pays about $208k versus about $210k for comparable roles.

Based on 240 similar postings.

Employer

About Samsung Semiconductor

Samsung Semiconductor is the global semiconductor business unit of Samsung Electronics, designing and manufacturing memory chips, logic semiconductors, and foundry solutions for a broad range of applications.

Samsung Semiconductor currently has 54 open roles on FindRole.

Listed pay typically runs $163,000–$253,000 across 54 roles with salary data.

Most-posted roles

View all roles at Samsung Semiconductor

At a glance

TL;DR · Staff Compiler Engineer - PyTorch + Kernel DSLPLATE

Join our team as a Staff Compiler Engineer specializing in PyTorch and Kernel DSL development, where you will adapt torch.compile to fit our backend by lowering Inductor's IR to our hardware and defining fusion strategies. You’ll build or extend kernel DSLs for our unique hardware, design placement and scheduling passes, implement parallelism-aware lowering, and engage with upstream review processes for open-source projects like PyTorch and Triton. Ideal candidates have 3-5 years of experience in technologies such as MLIR, XLA, TVM, Inductor, or similar, along with a background in HPC, distributed systems, and non-flat memory hierarchies. Experience with kernel autotuning and open-source contributions is highly valued.

What you'll do

  • Adapt torch.compile to backend by lowering Inductor's IR to hardware.
  • Build or extend kernel DSLs for custom hardware, deciding changes needed in frontend/backend.
  • Design placement and scheduling passes for distributed memory model optimization.
  • Implement parallelism-aware lowering for tensor, pipeline, expert, and sequence parallelism.
  • Contribute upstream to open-source projects like PyTorch, Triton, Helion, and MLIR.

What we're looking for

  • 10+ years of industry experience in relevant fields or equivalent education and experience.
  • Experience designing a kernel DSL or making significant changes to an existing one.
  • Proficiency in MLIR, including writing dialects, passes, and backend integration.
  • Expertise in building PyTorch backends for non-CUDA accelerators like XPU, ROCm, TPU.
  • Knowledge of kernel autotuning, performance modeling, and cost-based compilation techniques.
  • Background in HPC, distributed systems, or NUMA-aware programming to understand non-flat memory hierarchies.
  • Open-source contributions to PyTorch, Triton, Helion, LLVM/MLIR, or similar projects.

More like this

Similar roles

Careers

Qualcomm

Santa Clara, CA 59 days ago
MLIR LLVM Pytorch 2.0 TVM Triton SYCL C++ Python CUDA OpenCL Polyhedral Compiler Optimization Loop Transformation Vectorization GPU Programming CI/CD Git Linux Docker Kubernetes

Senior Deep Learning Compiler Verification Engineer

Nvidia

Remote (Santa Clara, CA) 36 days ago $140,000$224,250
Python C++ PyTorch JAX TensorRT LLVM MLIR TVM XLA Type Systems Program Semantics Proof-Based Verification Quantization Operator Fusion Mixed-Precision Graph-Level Optimization
Remote

Senior Deep Learning Compiler Engineer - XLA

Nvidia

Remote (Santa Clara, CA) 99 days ago $152,000$241,500
C/C++ CUDA JAX PyTorch TensorFlow XLA MLIR LLVM OpenAI_Triton GPU distributed_programming performance_analysis compiler_optimizations clean_software_engineering_practices high_performance_computing
Remote

Senior Machine Learning Applications and Compiler Engineer, LPX

Nvidia

Remote (Santa Clara, CA) 77 days ago $152,000$241,500
C/C++ Rust LLVM MLIR TensorFlow PyTorch ONNX GPU Profiling tools Tracing tools Benchmarking tools CI/CD Parallel computing Heterogeneous computing Spatial architectures Dataflow architectures Large-scale AI systems
Remote

Machine Learning Compiler Engineer

Qualcomm

New York, NY 30 days ago $200,800$301,200
MLIR LLVM Pytorch 2.0 TVM Triton SYCL Python C++ CUDA OpenCL Polyhedral Compiler Optimization Loop Transformation Vectorization GPU Programming High Performance Computing CI/CD Git Linux Docker