AI Research Engineer - Datadog AI Research (DAIR) | Datadog Careers

Datadog

Hybrid

Quick summary

Work type
Hybrid
Location
New York, NY
Posted
18 days ago

Market check

Salary context

How this pay compares to similar roles

Similar $205k
$154k most similar roles pay here $256k

This listing doesn't post a salary. Most similar roles pay $163,871–$246,150.

Based on 240 similar postings.

Employer

About Datadog

Datadog, Inc. is an American company that provides an observability service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.

Datadog currently has 130 open roles on FindRole.

Listed pay typically runs $187,000–$240,000 across 62 roles with salary data.

Most-posted roles

View all roles at Datadog

At a glance

TL;DR · AI Research Engineer - Datadog AI Research (DAIR) | Datadog Careers

As a Research Engineer at Datadog AI Research, you will collaborate with scientists to develop data pipelines, training infrastructure, and evaluation benchmarks for advanced AI systems. Your daily tasks include implementing multimodal models, running large-scale experiments, building simulation environments for agent training, and orchestrating distributed training using Ray. You must be proficient in Python and familiar with systems languages like Rust or Go, and have experience with frameworks such as PyTorch, Megatron-LM, and DeepSpeed. Additionally, you will contribute to research publications and work closely with Product and Engineering teams to integrate AI capabilities into Datadog’s products. This role focuses on high-impact areas like world models for observability and trained agents for autonomous incident response in cloud environments.

What you'll do

  • Build and operate multimodal data pipelines for training and evaluation.
  • Implement models, run large-scale experiments, and optimize for performance and cost.
  • Construct simulation environments and replay infrastructure for agent training.
  • Orchestrate distributed training and RL with Ray, managing scaling and recovery.
  • Establish automated benchmarks and regression tests for model predictions and agent performance.
  • Collaborate on integrating research capabilities into Datadog’s products and services.

What we're looking for

  • Extensive experience in distributed computing, reinforcement learning infrastructure, and large-scale ML systems.
  • Proficiency in Python and familiarity with a systems language (e.g., Rust, C++, Go), plus cloud and data infrastructure knowledge.
  • Practical experience implementing and operating ML training and inference systems at scale using frameworks like PyTorch or JAX.
  • Hands-on experience with large-scale model training techniques including SFT, RLVR, RLHF, and efficient inference methods.
  • Strong ability to explain technical design and performance trade-offs to both technical and non-technical audiences.
  • Experience supporting research publications in top-tier conferences (e.g., NeurIPS, ICLR, ICML).
  • Bonus: Software engineering skills in observability, SRE, or security domains.

More like this

Similar roles

Distinguished Applied AI Engineer, AI Transformation

Autodesk

San Francisco, CA 38 days ago $212,625$381,150
AI ML Python SQL Salesforce Marketo Gainsight Segment CI/CD Responsible AI LLMs GenAI Kubernetes AWS Azure Google Cloud Terraform GitHub PostgreSQL Snowflake Docker Prometheus Grafana

Data/AI Engineer

Cardinal Health

Remote (Us-Nationwide-Field, US) 5 days ago $94,900$135,600
Databricks Azure GCP PySpark Spark SQL Delta Lake T-SQL Microsoft SQL Server NLP NLU RAG LLM HIPAA SDLC Git CI/CD Claude Code Codex Medallion architecture Incremental loads Data modeling Partitioning Indexing Query optimization
Remote

AI Researcher - Efficiency Engineer, Hybrid

Cisco

Remote (San Jose, CA) 32 days ago $212,300$275,800
Python LLMs GitHub Copilot Cursor Claude Code Agentic AI MLOps AI Ops Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Terraform AWS
Remote Hybrid

Applied AI Engineer

Ramp

Remote (New York City, New York, US) 159 days ago $155,000$339,500
Python JavaScript Node.js Django Flask React PostgreSQL MongoDB AWS GCP Kubernetes Terraform CI/CD GitOps
Remote

Applied AI Engineer

Booz Allen Hamilton

Fort Belvoir, VA +1 36 days ago $99,000$225,000
Python FastAPI Flask Streamlit Gradio React TypeScript Kubernetes CI/CD Prometheus Grafana MLOps Docker PostgreSQL AWS Azure Google Cloud Platform

Applied AI Engineer

Apple Inc

Cupertino, CA 39 days ago $181,100$272,100
Python FastAPI LangChain LLMs GenAI RESTful APIs Vector databases Async programming Pipeline orchestration Prometheus OpenTelemetry Redis RabbitMQ Kafka Docker CI/CD