AI Research Engineer - Datadog AI Research (DAIR) | Datadog Careers

Datadog

Hybrid

Quick summary

Work type
Hybrid
Location
New York, NY
Salary
$140,000–$400,000 / yr
Posted
14 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $207k
This role $270k
$109k most similar roles pay here $431k

This role pays more than 91% of similar roles. Most pay $167,550–$246,150 — the shaded band above. At the midpoint, this role pays about $270k versus about $207k for comparable roles.

Based on 240 similar postings.

Employer

About Datadog

Datadog, Inc. is an American company that provides an observability service for cloud-scale applications, providing monitoring of servers, databases, tools, and services, through a SaaS-based data analytics platform.

Datadog currently has 114 open roles on FindRole.

Listed pay typically runs $187,000–$240,000 across 56 roles with salary data.

Most-posted roles

View all roles at Datadog

At a glance

TL;DR · AI Research Engineer - Datadog AI Research (DAIR) | Datadog Careers

As a Research Engineer at Datadog AI Research, you will join a team of scientists and engineers focused on developing advanced AI solutions for cloud observability and security. Your role involves building multimodal data pipelines, training infrastructure, and simulation environments to support the development of autonomous agents and world models. You will work on large-scale model training using frameworks like Ray and PyTorch, ensuring reliability and performance through rigorous benchmarking and testing. Additionally, you will collaborate with cross-functional teams to integrate research prototypes into Datadog’s products and contribute to top-tier academic publications. Proficiency in Python, experience with distributed computing and reinforcement learning infrastructure, and a background in ML systems are essential. Bonus points for hands-on experience with GPU programming, production data pipelines, and bridging the gap between research and real-world applications.

What you'll do

  • Build and operate multimodal data pipelines for training and evaluation.
  • Implement models, run large-scale experiments, and optimize for performance and cost.
  • Construct simulation environments and replay infrastructure for agent training.
  • Orchestrate distributed training and RL with Ray, managing scaling and recovery.
  • Establish automated benchmarks and regression tests for model predictions and agent performance.
  • Collaborate on integrating research capabilities into Datadog’s products and services.

What we're looking for

  • Extensive experience in distributed computing and reinforcement learning infrastructure.
  • Proficiency in Python and familiarity with systems languages like Rust, C++, or Go.
  • Practical experience implementing ML training and inference systems at scale using frameworks such as PyTorch or JAX.
  • Hands-on experience with large-scale model training and fine-tuning techniques including SFT, RLVR, RLHF, and efficient inference methods.
  • Strong ability to explain technical design and performance trade-offs to both technical and non-technical audiences.
  • Experience supporting or contributing to research publications in top-tier conferences like NeurIPS, ICLR, ICML.

More like this

Similar roles

AI Research Scientist - Datadog AI Research (DAIR) | Datadog Careers

Datadog

New York, NY 14 days ago $140,000$400,000
Python PyTorch DeepSpeed Megatron-LM CUDA NeurIPS ICLR ICML Distributed Training Reinforcement Learning Multimodal Learning GPU Programming Generative AI Large Foundation Models Rapid Prototyping Efficient Inference Data Pipelines

Senior AI Engineer | Datadog Careers

Datadog

New York, NY 14 days ago
Python Go Java LLMs Large Language Models Generative AI CI/CD Docker Kubernetes AWS PostgreSQL Git GitHub IDEs Static Code Analysis Compilers Dynamic Instrumentation
Hybrid

Distinguished Applied AI Engineer, AI Transformation

Autodesk

San Francisco, CA 35 days ago $212,625$381,150
AI ML Python SQL Salesforce Marketo Gainsight Segment CI/CD Responsible AI LLMs GenAI Kubernetes AWS Azure Google Cloud Terraform GitHub PostgreSQL Snowflake Docker Prometheus Grafana

Data/AI Engineer

Cardinal Health

Remote (Us-Nationwide-Field, US) 2 days ago $94,900$135,600
Databricks Azure GCP PySpark Spark SQL Delta Lake T-SQL Microsoft SQL Server NLP NLU RAG LLM HIPAA SDLC Git CI/CD Claude Code Codex Medallion architecture Incremental loads Data modeling Partitioning Indexing Query optimization
Remote

AI Researcher - Efficiency Engineer, Hybrid

Cisco

Remote (San Jose, CA) 29 days ago $212,300$275,800
Python LLMs GitHub Copilot Cursor Claude Code Agentic AI MLOps AI Ops Kubernetes Docker CI/CD Prometheus Grafana PostgreSQL Terraform AWS
Remote Hybrid

Applied AI Engineer

Ramp

Remote (New York City, New York, US) 156 days ago $155,000$339,500
Python JavaScript Node.js Django Flask React PostgreSQL MongoDB AWS GCP Kubernetes Terraform CI/CD GitOps
Remote