Staff ML Infrastructure Engineer (Compute)

General Motors (GM)

Remote

Quick summary

Work type: Remote
Location: Remote
Salary: $197,000–$326,000 / yr
Posted: 11 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $198k

This role $262k

$127k most similar roles pay here $347k

This role pays more than 87% of similar roles. Most pay $165,000–$232,000 — the shaded band above. At the midpoint, this role pays about $262k versus about $198k for comparable roles.

Based on 240 similar postings.

Employer

About General Motors (GM)

General Motors (GM) is a leading American multinational automotive corporation founded in 1908 and headquartered in Detroit, Michigan.

General Motors (GM) currently has 126 open roles on FindRole.

Listed pay typically runs $170,000–$258,500 across 75 roles with salary data.

Most-posted roles

View all roles at General Motors (GM)

At a glance

TL;DR · Staff ML Infrastructure Engineer (Compute)

Apply Now Log in to save

As a Staff ML Infrastructure Engineer on the AI Validation Platform team at GM, you will play a pivotal role in building and scaling robust compute platforms for simulation, data labeling, and data generation workflows. Your responsibilities include driving efficiency and high utilization of cutting-edge GPUs while enhancing platform reliability through technical leadership. You will collaborate with Simulation engineers and ML researchers to understand critical workflows and deliver incremental value by owning the technical roadmap and leading decisions on Compute architecture, caching, capacity provisioning, and auto-scaling mechanisms. Additionally, you will develop monitoring and observability tools for performance optimization and proactively research new technologies to integrate into the platform. Ideal candidates have 8+ years of experience in high-performance backend services, expertise in Docker, Kubernetes, Go, and cloud platforms like GCP or AWS, and a strong bias for action and problem-solving skills. Experience with hardware-in-the-loop validation systems and HPC is preferred.

Skills

Kubernetes Docker Go AWS GCP Azure CI/CD Prometheus Grafana Python PostgreSQL Terraform GitLab HPC GPU Telemetry

What you'll do

Own the technical roadmap for Compute architecture, caching, capacity provisioning, and auto-scaling mechanisms.
Drive development of monitoring, observability, and metrics to ensure reliability and performance optimization.
Collaborate with engineers to understand critical workflows and translate them into platform requirements.
Proactively research and integrate frameworks, hardware accelerators, and distributed computing techniques.
Lead large-scale technical initiatives across GM’s ML infrastructure.

What we're looking for

8+ years of industry experience in high-performance backend services.
Expertise in container technologies like Docker and Kubernetes.
Proficiency in Go or similar coding languages.
Experience working with cloud platforms such as GCP, Azure, or AWS.
Strong problem-solving skills and ability to drive cross-functional initiatives.
Hands-on experience with Cloud VM services and hardware-in-the-loop validation systems.
Familiarity with high-performance computing (HPC) and GPU optimizations.

Sr./Staff ML Infrastructure Engineer, Compute (TPU Scheduling) - Foundation Model

Apple Inc

Seattle, WA 29 days ago $171,600–$302,200

Python Kubernetes Go TPU GPU JAX PyTorch TensorFlow Ray Pathways Docker CI/CD

Save