Senior Solutions Architect - AI Factory Deployment

Nvidia

Remote

Quick summary

Work type: Remote
Location: Austin, TXDurham, NCSanta Clara, CA
Salary: $184,000–$287,500 / yr
Posted: 46 days ago
Nearby: 99+ roles within 25 mi

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $208k

This role $236k

$156k most similar roles pay here $302k

This role pays more than 70% of similar roles. Most pay $170,000–$246,150 — the shaded band above. At the midpoint, this role pays about $236k versus about $208k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 967 open roles on FindRole.

Listed pay typically runs $168,000–$270,250 across 950 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Solutions Architect - AI Factory Deployment

Apply Now Log in to save

The Senior Solutions Architect - AI Factory Deployment role within NVIDIA’s Infrastructure Specialists team in Santa Clara involves developing and deploying end-to-end AI factories. Day-to-day responsibilities include setting up and validating multi-GPU and multi-node Linux clusters for AI/LLM workloads, ensuring optimal performance through NCCL and collective communication patterns like AllReduce and AllToAll. The candidate will also build observability tools and automation scripts in Python and Shell to monitor and optimize benchmarks, collaborating with hardware and software teams to prepare AI factories for customer deployment. Essential skills include extensive experience managing Linux-based systems in HPC or distributed settings, proficiency with PyTorch or TensorFlow, and a solid understanding of collective communication patterns in modern ML/LLM training.

Skills

Linux Python Shell NCCL AllReduce AllToAll PyTorch TensorFlow Bash Benchmarking Metrics Messaging_Systems Logging Tracing CI/CD HPC GPU_Clusters

What you'll do

Set up and verify AI factory environments on multi-GPU Linux clusters.
Execute key AI/LLM benchmarks and analyze results for performance optimization.
Investigate and resolve issues in training jobs or benchmarks that fail or underperform.
Build observability tools to monitor workload behavior and system health.
Develop automation scripts for running benchmarks and collecting results.
Recommend changes to improve throughput, latency, and scaling efficiency of AI workloads.

What we're looking for

Over 6 years of experience managing Linux-based systems in HPC or extensive AI/ML settings.
Hands-on experience with multi-GPU/multi-node clusters and NCCL.
Solid understanding of collective communication patterns like AllReduce and AllToAll.
Proficiency in Python and Shell/Bash for scripting, automation, and tooling.
Experience with benchmarking and interpreting performance benchmarks.
Comfortable working with observability data to troubleshoot complex distributed workloads.
Strong cross-functional team collaboration and communication skills.

Similar roles

Senior AI Solutions Architect

Nvidia

Remote (Santa Clara, CA) 6 days ago $152,000–$241,500

Python C/C++ PyTorch Tensorflow Kubernetes GitHub NVIDIA CUDA Docker Prometheus Grafana CI/CD PostgreSQL AWS Azure MLOps

Remote

Save

Senior Solutions Architect, AI Infrastructure

Nvidia

Austin, TX +3 18 days ago $184,000–$287,500

NVIDIA_Ethernet InfiniBand GPUs CPUs PCIe DPUs NICs HCAs switches rack_scale_design system_hardware_architecture kernel_drivers PCIe_devices AI_data_centers CI/CD

Save

Senior Solutions Architect, AI Infrastructure

Nvidia

Austin, TX +3 16 days ago $184,000–$287,500

NVIDIA_Ethernet InfiniBand GPUs CPUs PCIe DPUs NICs HCAs switches rack_scale_design large_scale_GPU_infra_deployments system_hardware_architecture kernel_drivers PCIe_devices AI_data_center_networking

Save

Senior Solution Architect, AI Infrastructure

Nvidia

Remote (Us, Dc, Remote, US) 34 days ago $184,000–$287,500

NVIDIA_GPUs NVIDIA_Networking InfiniBand Ethernet NCCL DCGM UFM Mission_Control Base_Command_Manager AI_solutions High_Performance_Computing Networking Python CI/CD Git AWS Azure Grafana Prometheus

Remote

Save

Solutions Architect - OEM AI Factory Infrastructure

Nvidia

Santa Clara, CA 129 days ago $152,000–$241,500

Linux Python Slurm NVIDIA GPUs CUDA Docker Kubernetes Terraform AWS CI/CD PostgreSQL Infiniband Ethernet HPC DevOps Site Reliability Engineering

Save

Solutions Architect, OEM AI Factory Infrastructure

Nvidia

Santa Clara, CA 38 days ago $152,000–$241,500

Linux Python Slurm NVIDIA GPUs CUDA C C++ Docker Kubernetes Terraform AWS CI/CD PostgreSQL Infiniband Ethernet HPC DevOps Site Reliability Engineering

Save