Site Reliability Engineering Lead, Model Serving

General Dynamics

Quick summary

Work type
On-site
Location
Washington, DC
Salary
$169,604–$229,464 / yr
Posted
5 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $181k
This role $200k
$132k most similar roles pay here $240k

This role pays more than 68% of similar roles. Most pay $150,000–$211,200 — the shaded band above. At the midpoint, this role pays about $200k versus about $181k for comparable roles.

Based on 240 similar postings.

Employer

About General Dynamics

General Dynamics is a global aerospace and defense company offering a broad portfolio of products and services in business aviation, ship construction, land combat vehicles, and information technology. It serves customers in the U.S. government, allied governments, and a diverse array of commercial markets.

General Dynamics currently has 430 open roles on FindRole.

Listed pay typically runs $112,924–$149,500 across 364 roles with salary data.

Most-posted roles

View all roles at General Dynamics

At a glance

TL;DR · Site Reliability Engineering Lead, Model Serving

As a Site Reliability Engineering Lead for Model Serving at GDIT’s CDAO Advana team in the DC area, you will own the production reliability strategy for AI and ML model serving across multiple security domains. Your daily responsibilities include defining service-level objectives, developing operational standards, implementing reliability engineering methodologies using Kubernetes, Prometheus, Grafana, GitLab CI, and VMware environments, and leading coordination with multi-national teams to align reliability strategies with evolving architectures and mission priorities. You will also produce critical deliverables such as alerting configurations, runbooks, and incident reports while ensuring operational stability and readiness across all enclaves. This role requires expertise in Kubernetes, Prometheus, Grafana, GitLab CI, VMware environments, and hardened deployment pipelines, along with a strong background in production reliability for AI/ML model serving.

What you'll do

  • Defines service-level objectives and alerting philosophy for AI/ML model serving.
  • Establishes reliability governance by developing operational standards and incident response patterns.
  • Implements reliability engineering methodologies using Kubernetes, Prometheus, Grafana, and GitLab CI.
  • Develops automated reliability checks to validate performance and availability of production models.
  • Leads coordination with multi-national teams to align reliability strategy with evolving architectures.
  • Produces mission-critical deliverables including alerting configurations, operational runbooks, and incident reports.

What we're looking for

  • US citizenship and TS/SCI eligibility required
  • 8+ years of experience in production reliability for AI/ML model serving
  • IAT II certification (Security+)
  • Expertise in Kubernetes, Prometheus, Grafana, Elastic Stack, GitLab CI
  • Experience with VMware environments and hardened deployment pipelines
  • Strong coordination skills with multi-national engineering teams and mission partners

More like this

Similar roles

Lead Site Reliability Engineering, Network

JPMorgan Chase

Palo Alto, CA +1 13 days ago $152,000$215,000
AWS Azure Grafana Prometheus Terraform Jenkins GitLab CI/CD eBPF Palo Alto Juniper F5 Broadcom Arista Cisco SD-WAN TCP/IP HTTPS BGP Kubernetes

Lead Engineer, Reliability Engineering & Enablement

Target

MN 7 days ago $132,000$238,000
Python Java Kubernetes Docker AWS CI/CD Prometheus Grafana PostgreSQL Redis Event-Driven Architecture Microservices DevOps Scalability Reliability Security Observability Automation AI-Assisted Engineering
Hybrid

Lead Site Reliability Engineer

Alteryx

Remote 82 days ago $136,000$177,000
Kubernetes CI/CD GitOps ArgoCD SLO SLA observability Infrastructure as Code chaos engineering Datadog Grafana Python Java C++ JavaScript AWS Azure Google Cloud Platform PostgreSQL MySQL Redis Docker Terraform
Remote

Lead Site Reliability Engineer

Alloy

NY 84 days ago $179,000$226,000
Kubernetes Terraform Python Go Docker CI/CD AWS Datadog CloudWatch ELK EFK JavaScript
Hybrid

Lead Site Reliability Engineer

JPMorgan Chase

New York, NY 10 days ago $152,000$215,000
CI/CD Kubernetes Docker Terraform JavaScript Go Python GraphQL Kafka OpenTelemetry AI Jenkins GitLab ECS