Senior Resiliency and Safety Architect

Nvidia

Actively hiring
Santa Clara, US Posted 22 days ago $184,000$287,500 / year

At a glance

AI generated

TL;DR

NVIDIA seeks a Senior Resiliency and Safety Architect to join its innovative Accelerated and Resilient Compute Systems team, focusing on enhancing GPU and Tegra SoC hardware and software resiliency. In this role, you will collaborate with cross-functional teams to design robust safety features, optimize system performance, and ensure compliance with industry standards such as ISO 26262 and ASPICE. Daily tasks include modeling RAS metrics, developing diagnostics software, and conducting simulations to analyze architectural vulnerabilities. Ideal candidates hold a Master’s or PhD in Computer Science or related fields, with extensive experience in computer architecture and proficiency in C/C++ and Python. Familiarity with CUDA, Verilog RTL coding, and machine learning concepts is beneficial, as is a strong background in resiliency and functional safety.

Skills

C C++ Python CUDA Verilog ISO 26262 ASPICE MISRA Cert-C FMEA DFA FTA GPU SOC Machine_Learning Deep_Learning CI/CD

What you'll do

  • Collaborate with teams to architect new safety and resiliency features for GPUs and SoCs.
  • Optimize hardware and software to enhance system robustness, performance, and security.
  • Model and analyze RAS metrics like Failures in Time and Availability.
  • Develop diagnostics software components for Resiliency and Safety on NVIDIA GPUs.
  • Participate in testing new and existing resiliency and safety features.
  • Ensure compliance with ISO 26262 and ASPICE standards for product development.

What we're looking for

  • At least 5+ years of relevant experience in computer system architecture or related fields.
  • Master’s or PhD degree in Computer Science, Engineering, or equivalent practical experience.
  • Proficiency in C/C++ and scripting with Python or similar languages.
  • Experience with resiliency and functional safety standards (ISO 26262, ASPICE).
  • Strong debugging skills and ability to analyze RAS metrics like Failures in Time and Availability.
  • Familiarity with GPU and SOC architectures, Verilog RTL coding, and machine learning concepts.
  • Excellent interpersonal skills for collaboration across on-site and remote teams.

Market check

Salary context

This $184,000–$287,500 range sits above 86% of similar postings on FindRole.

Peer median band

$149,750$223,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$142,450$224,300

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 801 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 797 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior System Architect, Infrastructure Reliability

Nvidia

Us, Ca, Santa Clara, US 93 days ago $184,000$287,500
Python C++ Kubernetes Slurm NVIDIA DCGM NVML Linux kernel CUDA Prometheus Grafana CI/CD Machine Learning Docker CRIU Tracing tools PostgreSQL Redis MESOS Hadoop

Senior Security Architect

Nvidia

Remote (Us, Ca, Santa Clara, US) 11 days ago $184,000$287,500
Linux C/C++ Risk Management Threat Modeling Vulnerability Management Access Control Incident Response Disaster Recovery Compliance Data Protection OAuth 2.1 OIDC Kerberos FIDO2 WebAuthn Microsoft Active Directory Entra ID FreeIPA RHEL IdM SSSD PKI SELinux AppArmor eBPF Rust Slurm Lustre NFS Docker Enroot Kubernetes InfiniBand Zero Trust ZTNA VRFs CVSS 4.0 SBOM
Remote

Senior Core Infrastructure Engineer

Highnote

US 83 days ago $170,000$230,000
GCP AWS Kubernetes Istio Python Java CI/CD Prometheus Grafana Spanner BigQuery Dataflow Pub/Sub

Senior Site Reliability Engineer

Adobe

San Jose, US 51 days ago $208,300$301,600
AWS Kubernetes Terraform Python Go CI/CD Infrastructure as Code Docker PostgreSQL Security hardening AI-enabled platforms Cross-team leadership Developer experience optimization

Senior Site Reliability Engineer

CoStar Group

US 11 days ago
AWS Kubernetes Docker Terraform CloudFormation Python Java C# NodeJS Bash PCI compliance REST API Microservices CDN PostgreSQL MySQL Azure Google Cloud CI/CD