Senior Solutions Architect, AI Cluster Performance and Telemetry

Nvidia

Quick summary

Work type: On-site
Location: Santa Clara, CA · Austin, TX
Salary: $184,000–$287,500 / yr
Posted: 2 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $214k

This role $236k

$167k most similar roles pay here $300k

This role pays more than 73% of similar roles. Most pay $180,906–$246,150 — the shaded band above. At the midpoint, this role pays about $236k versus about $214k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 985 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 971 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Solutions Architect, AI Cluster Performance and Telemetry

Apply Now Log in to save

As a Senior Solutions Architect specializing in Data Center Systems & Performance, you will join our elite team and serve as a technical expert bridging engineering, field teams, and customers with demanding requirements. Your primary responsibilities include identifying and resolving complex performance bottlenecks across GPU, CPU, and networking systems, maintaining robust benchmarking suites to stress-test high-performance clusters, and applying industry-standard tools to monitor hardware performance counters and extract system telemetry. You will collaborate closely with internal engineering units and external partners to develop solutions that enhance infrastructure performance. The ideal candidate has a strong background in system build, performance analysis, and technical customer-facing roles, along with expertise in CPUs, GPUs, high-speed networking fabrics, and tools like Perf, eBPF, Prometheus, Grafana, Docker, Kubernetes, and SLURM. Additionally, you should have experience optimizing distributed AI training workloads and integrating Agentic AI frameworks to automate cluster triage.

Skills

Perf eBPF Prometheus Grafana Docker Kubernetes SLURM Ansible NCCL NVIDIA Nsight Python C++ CUDA TensorFlow PyTorch CI/CD

What you'll do

Analyze and resolve complex performance bottlenecks in GPU, CPU, and networking systems.
Develop and maintain benchmarking suites for high-performance clusters.
Use industry-standard tools to monitor hardware performance counters and extract system telemetry.
Investigate configurations to identify and fix issues impacting peak performance.
Collaborate with internal teams and customers to enhance infrastructure performance.

What we're looking for

8+ years of industry experience in system build, performance analysis, and technical customer-facing roles.
Strong understanding of CPU-GPU interactions and high-speed networking fabrics in massive clusters.
Practical experience with performance tools like Perf, eBPF, Prometheus, and Grafana.
Experience working with containers, cloud provisioning, and scheduling tools such as Docker, Kubernetes, SLURM.
Ability to transform raw telemetry into structured time series data and create actionable narratives.
Deep knowledge of multi-GPU communication libraries and NVIDIA hardware architectures.
Practical experience optimizing distributed AI training workloads and integrating agentic AI frameworks.

Similar roles

Senior Solutions Architect - Enterprise AI

Nvidia

Remote (Santa Clara, CA) 9 days ago $184,000–$287,500

Ethernet InfiniBand SDN Kubernetes Docker AWS Azure GCP CI/CD Prometheus PostgreSQL Terraform Cisco Arista Cloud Engineer Juniper Networks TCO analysis Python Ansible

Remote

Save

Senior Solutions Architect, AI Infrastructure

Nvidia

Austin, TX 10 days ago $184,000–$287,500

NVIDIA_Ethernet InfiniBand GPUs CPUs PCIe DPUs NICs HCAs switches rack_scale_design system_hardware_architecture kernel_drivers PCIe_devices AI_data_centers CI/CD

Save

Senior Solutions Architect, AI Infrastructure

Nvidia

Austin, TX 8 days ago $184,000–$287,500

NVIDIA_Ethernet InfiniBand GPUs CPUs PCIe DPUs NICs HCAs switches rack_scale_design large_scale_GPU_infra_deployments system_hardware_architecture kernel_drivers PCIe_devices AI_data_center_networking

Save

Senior Solution Architect, AI Infrastructure

Nvidia

Remote (Us, Dc, Remote, US) 26 days ago $184,000–$287,500

NVIDIA_GPUs NVIDIA_Networking InfiniBand Ethernet NCCL DCGM UFM Mission_Control Base_Command_Manager AI_solutions High_Performance_Computing Networking Python CI/CD Git AWS Azure Grafana Prometheus

Remote

Save

Senior Solutions Architect, AI Cloud Services

Nvidia

Remote (Santa Clara, CA) 4 days ago $152,000–$241,500

AWS GCP Azure OCI Docker Kubernetes Python CUDA Triton TensorRT-LLM

Remote

Save

Senior Solutions Architect, AI Hyperscalers

Nvidia

Remote (Canada) 24 days ago $184,000–$287,500

Python CUDA PyTorch JAX Linux Docker Kubernetes HPC GPU Distributed Training Inference Optimization Vector Databases RAG Pipelines Multi-node Clusters Deep Learning Frameworks

Remote

Save