Systems Quality and Reliability Lead - LPU

Nvidia

Actively hiring
Santa Clara, US Posted 30 days ago $168,000$264,500 / year

At a glance

AI generated

TL;DR

We are seeking a Lead Systems Quality and Reliability Engineer to join our LPU team, where you will own and build the root-cause analysis process for Nvidia AI/ML products. Your day-to-day responsibilities include conducting debug analyses, creating FA reports, identifying quality trends, and overseeing hardware performance metrics. You will also manage operational performance at contract manufacturers and set up new products in Failure Analysis operations. Ideal candidates have a BS/MS in EE or Physics with 8+ years of hands-on experience in systems test and validation engineering. Required skills include proficiency with lab equipment, reliability tests, FA techniques like FIB and SEM, fault isolation methods such as OBIRCH, and programming languages like Python and C++ on UNIX/Linux. This role demands expertise in high-speed interfaces and PCB card/system level testing to ensure top-tier quality standards for our cutting-edge AI products.

Skills

Python Perl C++ UNIX Linux FIB SEM TDR VNA CSAM OBIRCH DLS LADA LVP LVI SerDes PCIe DDR HTOL Burn in PCB

What you'll do

  • Conduct and lead root cause analysis for field RMAs of Nvidia AI/ML products.
  • Scale failure analysis capabilities within the organization.
  • Create detailed FA reports following standard 8D or similar processes.
  • Analyze RMA data to identify trends and drive quality improvement plans.
  • Oversee hardware quality performance by monitoring key metrics like MTBF.
  • Manage operational performance of FA at contract manufacturers (CMs).
  • Set up new products for failure analysis operations in the organization.

What we're looking for

  • 8+ years hands-on experience in systems test and validation engineering.
  • BS/MS in Electrical Engineering, Physics or related field required.
  • Proven management and leadership experience in quality and reliability roles.
  • Expertise with lab equipment including oscilloscopes and logic analyzers.
  • Experience enabling reliability tests like HTOL and burn-in processes.
  • Knowledge of FA techniques such as FIB, SEM, TDR, VNA, CSAM, OBIRCH.
  • Proficiency in high-speed interfaces (SerDes, PCIe, DDR) and scripting languages.

Market check

Salary context

This $168,000–$264,500 range sits above 81% of similar postings on FindRole.

Peer median band

$136,000$213,325

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$148,275$210,675

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

System Reliability & Support Lead

PNC

Dallas Innovation Center - Luna Rd (Tx270), US 21 days ago
ServiceNow Dynatrace Oracle SQL CI/CD Kubernetes Docker Prometheus Grafana Python PostgreSQL

QA Lead — Automation & Quality Engineering - VP

Citi

Remote (3800 Citigroup Center Drive Building B Tampa, US) 39 days ago $113,840$170,760
Selenium Java .NET UI testing API testing Performance testing CI/CD CRM Postman REST-assured Playwright Cypress
Remote

Lead Quality Release Engineer

Salesforce

Remote (Colorado - Denver, US) 28 days ago $148,500$223,900
Salesforce Hyperforce Falcon CI/CD Git DevOps AWS Kubernetes Terraform Python PostgreSQL JSON/WebAPIs Docker Prometheus Grafana
Remote

Lead Software Engineer, Customer Experience & Reliability

Morningstar Inc

Chicago, Illinois, US 9 days ago $114,100$167,350
C# .NET .NET Core JavaScript AWS Docker Kubernetes CI/CD Splunk New Relic Prometheus Grafana Terraform Git Jira Confluence Mentorship Customer Experience Reliability Engineering

System Reliability and Support Specialist Sr.

PNC

Two Pnc Plaza (Pa374), US 44 days ago
Informatica Dynatrace ServiceNow Jira BigPanda uDeploy Jenkins BitBucket SQL SQL Developer OCP Exadata Mong Oracle .NET C# CI/CD OpenShift Kubernetes

Senior Staff/Principal Reliability Engineer

Qualcomm

San Diego, Ca,Us, US 24 days ago $180,200$270,200
Python C++ MATLAB R Weibull analysis JMP Minitab Kubernetes AWS Git CI/CD PostgreSQL SQLite Linux Windows Advanced Packaging Reliability TSV-based 2.5D/3D Integration Chiplet Architectures Backside Power Delivery Schemes