Solutions Architect, Inference Deployments

Nvidia

Actively hiring
Us, Ca, Santa Clara, US Posted 43 days ago $152,000$241,500 / year

At a glance

AI generated

TL;DR

As a Solutions Architect focused on inference at our innovative team, you will work closely with engineering and DevOps teams to develop enterprise-grade AI solutions using NVIDIA’s GPU technology and Kubernetes. Your daily tasks include building efficient inference pipelines with tools like NVIDIA Dynamo, orchestrating disaggregated inference tasks across Kubernetes clusters, and accelerating these pipelines with TensorRT-LLM and other backends for seamless integration. You will also mentor customers and internal teams in deploying complex disaggregated inference systems while resolving intricate technical issues. Ideal candidates have over five years of experience in solutions architecture, particularly in deploying distributed AI inference workloads on Kubernetes, along with expertise in NVIDIA’s Dynamo, Triton Inference Server, TensorRT-LLM, and GPU orchestration tools like the NVIDIA GPU Operator and NIM Operator. A deep understanding of transformer neural networks and advanced inference acceleration techniques is essential, as well as contributions to open-source projects such as NVIDIA Dynamo or vLLM.

Skills

NVIDIA_Dynamo Kubernetes TensorRT-LLM vLLM SGLang Triton_Inference_Server NVIDIA_GPU_Operator NIM_Operator MIG_Partitioning RDMA UCX Quantization Speculative_Decoding WideEP NVIDIA_Certified_AI_Engineer CI/CD

What you'll do

  • Build efficient inference pipelines using NVIDIA Dynamo and Kubernetes.
  • Accelerate inference with TensorRT-LLM and other backend technologies for seamless integration.
  • Mentor customers and internal teams on deploying disaggregated inference systems.
  • Resolve complex issues related to GPU allocation and memory hierarchies.
  • Optimize large language models for low-latency inference in enterprise settings.

What we're looking for

  • 5+ years experience in Solutions Architecture with focus on distributed systems and AI inference workloads on Kubernetes.
  • Expertise in deploying NVIDIA inference technologies like Dynamo, Triton Inference Server, TensorRT-LLM for model optimization.
  • Proficiency in GPU orchestration using NVIDIA GPU Operator, NIM Operator, and MIG partitioning techniques.
  • Experience solving complex issues related to GPU allocation, memory hierarchies, and low-latency networking.
  • Demonstrated success in tuning large language models for efficient low-latency inference in enterprise settings.
  • BS in Computer Science/Engineering or equivalent experience required; NVIDIA Certified AI Engineer preferred.

Market check

Salary context

This $152,000–$241,500 range sits above 57% of similar postings on FindRole.

Peer median band

$152,000$241,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$161,965$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Solutions Architect, Inference Deployments

Nvidia

Us, Ca, Santa Clara, US 43 days ago $152,000$241,500
NVIDIA_Dynamo Kubernetes TensorRT-LLM vLLM SGLang Triton_Inference_Server NVIDIA_GPU_Operator NIM_Operator MIG_Partitioning RDMA UCX Quantization Speculative_Decoding WideEP NVIDIA_TensorRT PostgreSQL CI/CD GitHub Prometheus Grafana

Solution Architect - Modern Applications

Broadcom

Usa-Colorado-Colorado Springs-4420 Arrowswest Drive, US 149 days ago $108,000$172,800
VMware Cloud Computing IT Infrastructure Data Center Design Platform Architecture CI/CD Terraform AWS Kubernetes Python PostgreSQL Grafana Prometheus Docker

Senior Solution Architect, AI Infrastructure

Nvidia

Remote (Us, Dc, Remote, US) 17 days ago $184,000$287,500
NVIDIA_GPUs NVIDIA_Networking InfiniBand Ethernet NCCL DCGM UFM Mission_Control Base_Command_Manager AI_solutions High_Performance_Computing Networking Python CI/CD Git AWS Azure Grafana Prometheus
Remote

Solutions Architect - AI Networking and Storage

Nvidia

Remote (Us, Tx, Remote, US) 45 days ago $184,000$287,500
NVIDIA Intel x86 ARM GPU Lustre GPFS Ceph RDMA Kubernetes CSI TensorRT TensorRT-LLM NVIDIA NIM NVIDIA NeMo Framework NVIDIA Triton Inference Server HPC AI SAN NAS
Remote

AI Solution Architect

Booz Allen Hamilton

Locations Nellis Afb, Nevada, US 14 days ago $112,800$257,000
Palantir Foundry Palantir Gotham Kubernetes DevSecOps CI/CD Docker LLM AI/ML DevOps Secret clearance Top Secret clearance AWS