Solutions Architect, Inference Deployments

Nvidia

Actively hiring
Us, Ca, Santa Clara, US Posted 43 days ago $152,000$241,500 / year

At a glance

AI generated

TL;DR

As a Solutions Architect focused on inference at our cutting-edge team, you will work closely with engineering and DevOps teams to develop enterprise-grade AI solutions using NVIDIA’s GPU technology and Kubernetes. Your daily tasks include building efficient inference pipelines with tools like NVIDIA Dynamo, orchestrating disaggregated inference using Kubernetes for complex workloads, and accelerating these pipelines with TensorRT-LLM and other backends. You will also mentor customers and internal teams in deploying disaggregated inference systems and resolving intricate technical issues. Ideal candidates have over five years of experience in solutions architecture, a strong background in deploying distributed systems on Kubernetes, and expertise with NVIDIA’s Dynamo, Triton Inference Server, and TensorRT-LLM for model optimization. Additionally, proficiency in GPU orchestration using operators like NIM and MIG partitioning, as well as deep knowledge of transformer neural networks and inference acceleration technologies, is essential.

Skills

NVIDIA_Dynamo Kubernetes TensorRT-LLM vLLM SGLang Triton_Inference_Server NVIDIA_GPU_Operator NIM_Operator MIG_Partitioning RDMA UCX Quantization Speculative_Decoding WideEP NVIDIA_TensorRT PostgreSQL CI/CD GitHub Prometheus Grafana

What you'll do

  • Build efficient inference pipelines using NVIDIA Dynamo and Kubernetes.
  • Optimize inference pipelines with TensorRT-LLM for seamless integration.
  • Mentor customers and internal teams on deploying disaggregated inference systems.
  • Solve complex GPU allocation and memory hierarchy issues in enterprise settings.
  • Tune large language models for low-latency inference in production environments.

What we're looking for

  • 5+ years experience in Solutions Architecture with focus on distributed systems and AI inference workloads on Kubernetes.
  • Expertise in deploying NVIDIA inference technologies like Dynamo, Triton Inference Server, TensorRT-LLM for model optimization.
  • Proficiency in GPU orchestration using NVIDIA GPU Operator, NIM Operator, MIG partitioning, and solving complex GPU allocation issues.
  • Success in tuning large language models for low-latency inference in enterprise environments with deep knowledge of transformer neural networks.
  • BS in Computer Science/Engineering or equivalent experience; preferred certification as NVIDIA Certified AI Engineer.

Market check

Salary context

This $152,000–$241,500 range sits above 57% of similar postings on FindRole.

Peer median band

$152,000$241,500

Median floor and ceiling across peers.

Typical midpoint (25–75%)

$161,965$235,750

Middle half of comparable postings.

Based on 240 comparable postings.

* 240 is the maximum number of comparable postings sampled.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 802 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 798 roles with salary data.

Most-posted roles

View all roles at Nvidia

More like this

Similar roles

Solutions Architect, Inference Deployments

Nvidia

Us, Ca, Santa Clara, US 43 days ago $152,000$241,500
NVIDIA_Dynamo Kubernetes TensorRT-LLM vLLM SGLang Triton_Inference_Server NVIDIA_GPU_Operator NIM_Operator MIG_Partitioning RDMA UCX Quantization Speculative_Decoding WideEP NVIDIA_Certified_AI_Engineer CI/CD

Solution Architect - Modern Applications

Broadcom

Usa-Colorado-Colorado Springs-4420 Arrowswest Drive, US 149 days ago $108,000$172,800
VMware Cloud Computing IT Infrastructure Data Center Design Platform Architecture CI/CD Terraform AWS Kubernetes Python PostgreSQL Grafana Prometheus Docker

Senior Solution Architect, AI Infrastructure

Nvidia

Remote (Us, Dc, Remote, US) 17 days ago $184,000$287,500
NVIDIA_GPUs NVIDIA_Networking InfiniBand Ethernet NCCL DCGM UFM Mission_Control Base_Command_Manager AI_solutions High_Performance_Computing Networking Python CI/CD Git AWS Azure Grafana Prometheus
Remote

Solutions Architect - AI Networking and Storage

Nvidia

Remote (Us, Tx, Remote, US) 45 days ago $184,000$287,500
NVIDIA Intel x86 ARM GPU Lustre GPFS Ceph RDMA Kubernetes CSI TensorRT TensorRT-LLM NVIDIA NIM NVIDIA NeMo Framework NVIDIA Triton Inference Server HPC AI SAN NAS
Remote

AI Solution Architect

Booz Allen Hamilton

Locations Nellis Afb, Nevada, US 14 days ago $112,800$257,000
Palantir Foundry Palantir Gotham Kubernetes DevSecOps CI/CD Docker LLM AI/ML DevOps Secret clearance Top Secret clearance AWS