Senior HPC Performance Engineer - AI for Science at Scale
Nvidia
At a glance
AI generatedAs an AI & HPC Observability Engineer at NVIDIA’s Managed AI Superclusters (MARS) team, you will design and scale observability platforms to handle high-volume metrics, logs, and traces across distributed environments. Your day-to-day responsibilities include building high-performance backend services for telemetry ingestion, processing, and routing, as well as developing OpenTelemetry collectors and instrumentation libraries. You will also optimize metrics pipelines using large-scale time-series storage systems and collaborate with platform engineering teams to ensure operational excellence. The role requires expertise in Python, Go, or Java, along with hands-on experience with modern observability architectures like PromQL and Kafka, and familiarity with Kubernetes and cloud-native infrastructure. Ideal candidates have a background in data engineering and real-time performance tuning for AI/ML pipelines, GPU workload monitoring, and intelligent alerting systems.
Skills
What you'll do
What we're looking for
Market check
This $152,000–$241,500 range sits above 41% of similar postings on FindRole.
Peer median band
$170,300–$262,400
Median floor and ceiling across peers.
Typical midpoint (25–75%)
$181,758–$246,150
Middle half of comparable postings.
Based on 240 comparable postings.
* 240 is the maximum number of comparable postings sampled.
Employer
Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing
Nvidia currently has 801 open roles on FindRole.
Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.
Most-posted roles
More like this
Nvidia
Nvidia
Nvidia
Nvidia
Nvidia
Nvidia