Senior Software Development Engineer in Test - Datacenter Server OS

Nvidia

Actively hiring
Us, Ca, Santa Clara, US Posted 48 days ago $140,000$224,250 / year

At a glance

AI generated

TL;DR

NVIDIA’s Platform SWQA team seeks an experienced Software Quality Assurance Engineer to develop and execute comprehensive test plans for the company's HGX/DGX/MGX platforms across various servers, operating systems, firmware, and software stacks. This role involves installing and testing diverse system configurations, driving root cause analysis for reliability issues, and building automation frameworks using Python, Shell scripting, Ansible, Jenkins, C/C++, Java, and JavaScript. The ideal candidate will have a strong background in Linux troubleshooting, server-level automation, CI/CD processes, and DevOps practices, along with hands-on experience in AI tools, NLP, and model testing. Knowledge of NVIDIA GPU hardware, virtualization technologies like KVM and Docker orchestrated with Kubernetes, and parallel programming such as CUDA/OpenCL is highly desirable for this role that demands a high level of production quality standards.

Skills

Python Jenkins Ansible C Java JavaScript Linux Ubuntu RedHat CentOS SUSE Fedora TensorFlow PyTorch CI/CD GitHub GitLab Gerrit PXE SLURM Kubernetes Docker NVIDIA_GPU CUDA OpenCL

What you'll do

  • Develop and execute test plans for NVIDIA HGX/DGX/MGX platforms on servers and OS.
  • Install and test various systems OS, server firmware, and software stacks.
  • Conduct root cause analysis to identify and mitigate reliability and validation issues.
  • Build and develop automation frameworks and tests at the server and OS levels.
  • Review partner and supplier test results and prescribe additional reliability testing.
  • Manage bug lifecycle and collaborate with inter-groups to drive solutions.

What we're looking for

  • 5+ years of experience in OS and server level automation using Python, Shell, Ansible, Jenkins.
  • Strong troubleshooting and debugging skills for Linux environments including Ubuntu, RedHat, CentOS.
  • Experience developing AI tools/frameworks like TensorFlow, PyTorch, and conducting NLP benchmarking.
  • Proficient in CI/CD processes and DevOps practices with GitHub/GitLab/Gerrit.
  • Expertise in reliability testing and root cause analysis for server and OS level issues.
  • Knowledge of firmware, BMC/OpenBMC, network protocols, and enterprise storage devices.

Market check

Salary context

This $140,000–$224,250 range sits above 68% of similar postings on FindRole.

Peer median band

$117,000$211,000

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$139,087$214,462

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Senior Software Engineer - Datacenter Systems

Nvidia

Remote (Us, Ca, Santa Clara, US) 8 days ago $184,000$287,500
Python Rust C++ Shell Kubernetes Jenkins GitLab Ansible GitOps Prometheus Grafana CI/CD Linux Slurm NVIDIA DGX systems Docker Terraform AWS Azure Google Cloud Platform
Remote

Senior Software Engineer - Server Manageability

Nvidia

Remote (Us, Ca, Remote, US) 23 days ago $152,000$241,500
C C++ Python bash DMTF Standards MCTP Redfish SPDM PLDM I2C SPI PCIe JTAG OpenBMC IPMI Open Compute embedded Linux static analysis unit testing code coverage CI/CD
Remote

Senior Software Engineer, Developer Tools for Cloud

Nvidia

Remote (Us, Wa, Redmond, US) 8 days ago $152,000$241,500
Python JavaScript C++ Kubernetes GraphQL Go Rust Datadog ClickHouse Grafana CUDA HPC Networking Performance Optimization Microservices Web APIs Distributed Environments Algorithms Computer Architecture
Remote

Senior Software Program Manager - Datacenter Compute Server

Nvidia

Us, Ca, Santa Clara, US 14 days ago $200,000$322,000
Linux Python Agile PCIe Kubernetes Terraform Git Jira CI/CD PostgreSQL NVIDIA GPUs HPC AI Datacenter servers Firmware development Operating systems principles Linux OS Productivity tools Process automation Configuration management tools Agile tools

Senior Datacenter Product Development Engineer

Nvidia

Us, Ca, Santa Clara, US 9 days ago $168,000$258,750
Python Shell PCIe InfiniBand Ethernet I3C I2C SPI USB HPC FPGA CPLD FW secure-boot encrypted images root-cause analysis test specifications HW testing automation log parsing Operations Research Industrial Engineering statistics

Senior Software Development Engineer (DevOps)

CVS Health

Remote (Richardson-909 E Collins Blvd, US) 14 days ago $92,700$203,940
GCP Azure GitHub Actions Kubernetes Helm CI/CD Java Python Node.js Git Docker Terraform Microservices Agile Observability Telemetry
Remote