Staff Software Engineer - ML Observability | Datadog Careers

Datadog

Hybrid

Quick summary

Work type
Hybrid
Location
New York, NY
Salary
$234,000–$300,000 / yr
Posted
15 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $224k
This role $267k
$160k most similar roles pay here $315k

This role pays more than 84% of similar roles. Most pay $196,562–$251,775 — the shaded band above. At the midpoint, this role pays about $267k versus about $224k for comparable roles.

Based on 240 similar postings.

Employer

About Datadog

Datadog, Inc. is an American company that provides an observability service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.

Datadog currently has 109 open roles on FindRole.

Listed pay typically runs $187,000–$240,000 across 58 roles with salary data.

Most-posted roles

View all roles at Datadog

At a glance

TL;DR · Staff Software Engineer - ML Observability | Datadog Careers

As a Staff Engineer on the ML Observability team at Datadog, you will lead the development of cutting-edge tools to monitor and improve Large Language Model (LLM) systems in production. Your responsibilities include driving design and implementation of observability features, prototyping new product capabilities, and working cross-functionally with engineering teams, product managers, UX designers, and applied scientists to rapidly iterate on solutions. You will also develop tracing, evaluation, and debugging tools for LLMs, influence architecture decisions, mentor engineers, and stay attuned to customer needs to guide product priorities. The ideal candidate has a deep understanding of distributed systems, scalable backend architectures, hands-on experience with LLM-powered applications, and expertise in model internals, inference pipelines, and prompt engineering. Proficiency in observability tools and platforms is essential, as well as the ability to thrive in fast-paced environments and maintain a product-oriented mindset.

What you'll do

  • Drive the design and implementation of features for LLM observability.
  • Develop tools to trace, evaluate, and debug Large Language Models.
  • Influence architecture decisions and mentor engineers in building resilient systems.
  • Stay informed about industry trends to drive innovation within the team.
  • Ideate, prototype, and scale new product features for generative AI systems.

What we're looking for

  • Deep understanding of distributed systems and scalable backend architectures.
  • Hands-on experience building and shipping LLM-powered or GenAI applications.
  • Expertise in model internals, inference pipelines, evaluation techniques, and prompt engineering.
  • Ability to thrive in ambiguous, fast-changing environments with a product-oriented mindset.
  • Experience with observability tools/platforms and influence over architecture decisions.

More like this

Similar roles

Staff Software Engineer | Datadog Careers

Datadog

New York, NY 15 days ago $234,000$300,000
Python JavaScript Go Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Redis AWS Scalability Performance Optimization Data Visualization System Architecture Technical Leadership Cloud Services DevOps Practices
Hybrid

Staff Software Engineer - AI/ML Systems and Reliability

Adobe

San Jose 37 days ago $208,300$301,600
AWS Kubernetes Docker Python Java CI/CD Terraform Prometheus Grafana PostgreSQL Redis Elasticsearch Ray Kafka Spark Airflow MLOps DevOps microservices REST APIs cloud-native architectures

Senior Software Engineer, Observability

MongoDB

Dublin, Ireland 15 days ago
MongoDB Java Go Kafka Flink TypeScript React Node.js PostgreSQL CI/CD Docker Git Linux RESTful APIs JSON GraphQL MVC SaaS Python JavaScript
Hybrid

Senior Software Engineer — Observability

Apple Inc

Cary, NC 41 days ago
OpenTelemetry Grafana Kubernetes Python Java Kotlin Go Prometheus Terraform Docker CI/CD PostgreSQL Redis RabbitMQ Splunk Datadog LLMs AI APIs SRE CI/CD systems

Senior Software Engineer — Observability

Apple Inc

Austin, TX 41 days ago
OpenTelemetry Grafana Datadog Kubernetes Python Java Kotlin Go Prometheus Terraform CI/CD PostgreSQL NoSQL Redis RabbitMQ LLMs AI APIs SRE Docker AWS Azure