Staff ML Infrastructure Engineer (Compute)

General Motors (GM)

Remote

Quick summary

Work type
Remote
Location
Remote
Salary
$197,000–$326,000 / yr
Posted
11 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $198k
This role $262k
$127k most similar roles pay here $347k

This role pays more than 87% of similar roles. Most pay $165,000–$232,000 — the shaded band above. At the midpoint, this role pays about $262k versus about $198k for comparable roles.

Based on 240 similar postings.

Employer

About General Motors (GM)

General Motors (GM) is a leading American multinational automotive corporation founded in 1908 and headquartered in Detroit, Michigan.

General Motors (GM) currently has 126 open roles on FindRole.

Listed pay typically runs $170,000–$258,500 across 75 roles with salary data.

Most-posted roles

View all roles at General Motors (GM)

At a glance

TL;DR · Staff ML Infrastructure Engineer (Compute)

As a Staff ML Infrastructure Engineer on the AI Validation Platform team at GM, you will play a pivotal role in building and scaling robust compute platforms for simulation, data labeling, and data generation workflows. Your responsibilities include driving efficiency and high utilization of cutting-edge GPUs while enhancing platform reliability through technical leadership. You will collaborate with Simulation engineers and ML researchers to understand critical workflows and deliver incremental value by owning the technical roadmap and leading decisions on Compute architecture, caching, capacity provisioning, and auto-scaling mechanisms. Additionally, you will develop monitoring and observability tools for performance optimization and proactively research new technologies to integrate into the platform. Ideal candidates have 8+ years of experience in high-performance backend services, expertise in Docker, Kubernetes, Go, and cloud platforms like GCP or AWS, and a strong bias for action and problem-solving skills. Experience with hardware-in-the-loop validation systems and HPC is preferred.

What you'll do

  • Own the technical roadmap for Compute architecture, caching, capacity provisioning, and auto-scaling mechanisms.
  • Drive development of monitoring, observability, and metrics to ensure reliability and performance optimization.
  • Collaborate with engineers to understand critical workflows and translate them into platform requirements.
  • Proactively research and integrate frameworks, hardware accelerators, and distributed computing techniques.
  • Lead large-scale technical initiatives across GM’s ML infrastructure.

What we're looking for

  • 8+ years of industry experience in high-performance backend services.
  • Expertise in container technologies like Docker and Kubernetes.
  • Proficiency in Go or similar coding languages.
  • Experience working with cloud platforms such as GCP, Azure, or AWS.
  • Strong problem-solving skills and ability to drive cross-functional initiatives.
  • Hands-on experience with Cloud VM services and hardware-in-the-loop validation systems.
  • Familiarity with high-performance computing (HPC) and GPU optimizations.

More like this

Similar roles

Senior ML Infrastructure Engineer (Compute)

General Motors (GM)

Remote (Gm Automation - Sunnyvale - Gm Automation - Sunnyvale, US) 11 days ago
Go AWS GCP Azure Docker Kubernetes CI/CD Prometheus Grafana PostgreSQL Redis HPC GPU Telemetry Python C++ REST GraphQL GitLab
Remote

Staff Machine Learning Engineer, Compute

General Motors (GM)

Remote (Gm Automation - Sunnyvale - Gm Automation - Sunnyvale, US) 98 days ago $198,900$304,800
Python Kubernetes GCP Azure AWS Go C++ Docker CI/CD PyTorch Ray Prometheus Grafana PostgreSQL Redis GitLab GitHub MESOS YARN
Remote

Senior ML Engineer, ML compute

General Motors (GM)

Mountain View, California 109 days ago $155,420$395,900
Python Kubernetes Go C++ GCP Azure AWS PyTorch TorchX Ray Docker CI/CD
Hybrid

Senior ML Infrastructure Engineer, Inference Platform

General Motors (GM)

Sunnyvale, CA 11 days ago $155,420$205,900
Python Triton RayServe vLLM C++ Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Redis AWS Azure Google Cloud Platform Git Jenkins GitHub Slack Confluence Jira
Hybrid