Staff Software Engineer, ML Observability

Datadog

Hybrid

Quick summary

Work type
Hybrid
Location
Salary
$234,000–$300,000 / yr
Posted
30 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $216k
This role $267k
$160k most similar roles pay here $315k

This role pays more than 88% of similar roles. Most pay $184,613–$246,665 — the shaded band above. At the midpoint, this role pays about $267k versus about $216k for comparable roles.

Based on 240 similar postings.

Employer

About Datadog

Datadog, Inc. is an American company that provides an observability service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.

Datadog currently has 131 open roles on FindRole.

Listed pay typically runs $187,000–$240,000 across 66 roles with salary data.

Most-posted roles

View all roles at Datadog

At a glance

TL;DR · Staff Software Engineer, ML Observability

As a Staff Engineer on the ML Observability team at Datadog, you will lead the development of cutting-edge tools to monitor and improve AI systems in production, focusing on Large Language Models (LLMs) and generative AI. Your responsibilities include driving the design and implementation of observability features, prototyping new product capabilities, and working cross-functionally with engineering teams, product managers, UX designers, and applied scientists to rapidly iterate and find market fit. You will develop tools for tracing, evaluating, and debugging LLMs, influence architecture decisions, mentor engineers, and stay attuned to customer needs to guide priorities. The role requires a deep understanding of distributed systems, scalable backend architectures, and hands-on experience with LLM-powered applications. Additionally, you should be well-versed in model internals, inference pipelines, evaluation techniques, and prompt engineering, thrive in ambiguous environments, communicate effectively, and maintain a strong focus on clean code and innovation.

What you'll do

  • Drive the design and implementation of features for LLM observability.
  • Develop tools to trace, evaluate, and debug Large Language Models (LLMs).
  • Influence architecture decisions and mentor engineers in building robust systems.
  • Stay updated with industry trends to drive innovation in AI observability.
  • Ideate, prototype, and scale new product features for generative AI systems.

What we're looking for

  • Deep understanding of distributed systems and scalable backend architectures.
  • Hands-on experience building and shipping LLM-powered or GenAI applications.
  • Expertise in model internals, inference pipelines, evaluation techniques, and prompt engineering.
  • Ability to thrive in ambiguous, fast-changing environments with a product-oriented mindset.
  • Experience with observability tools/platforms and influencing architecture decisions.
  • Strong communication skills and commitment to clean, maintainable code.
  • Stay current with industry trends in machine learning and observability.

More like this

Similar roles

Senior Software Engineer, AI and Observability

The Walt Disney Company

Remote (New York, NY) +2 30 days ago $148,700$199,400
AWS Kubernetes Docker Python PyTorch TensorFlow PySpark Pandas LangChain Git CI/CD GPT-4 Claude Opus/Sonnet Datadog Grafana MongoDB
Remote

Senior Software Engineer, Observability

Okta Inc

Bellevue, WA +3 9 days ago $147,000$202,000
AWS Google Cloud Azure Datadog Metric Logs Traces Error Tracking Terraform Node.js Golang Docker Kubernetes OpenTelemetry Vector Microservice Architecture CI/CD
Hybrid

Senior Software Engineer, Observability

MongoDB

Dublin, Ireland 4 days ago
MongoDB Java Go Kafka Flink TypeScript React Node.js PostgreSQL CI/CD Docker Git Linux RESTful APIs GraphQL MVC Pattern Scalability Performance Tuning Indexing Debugging Monitoring Metrics Logging
Hybrid