Synthetic Data Generation and User Simulation PhD Research Intern — Fall 2026

Nvidia

Remote Actively hiring Posted this week
Canada · Santa Clara, CA Posted 5 days ago

At a glance

AI generated

TL;DR

As a senior researcher on our cutting-edge team, you will delve into developing advanced techniques for generative models and artificial data creation to enhance the training of large language models (LLMs). Your daily tasks include crafting high-fidelity synthetic data through behavioral calibration of simulated users against real-world signatures, procedural generation of probe scenarios, and trajectory synthesis guided by verification. You will collaborate with other experts to integrate these innovative methods into production pipelines and validate their impact on downstream model performance. Ideal candidates hold a PhD in Computer Science or related fields with expertise in deep learning, NLP, and LLM training, along with hands-on experience using Python, PyTorch, HuggingFace, and vLLM. This role requires a strong research background, including publications at top-tier AI conferences, and familiarity with multilingual and low-resource contexts.

Skills

Python PyTorch HuggingFace vLLM Distributed Training LLM training/serving stack Generative Modeling Synthetic Data Generation Deep Learning Frameworks NLP CI/CD

What you'll do

  • Research innovative techniques for high-fidelity synthetic data creation for LLMs.
  • Calibrate simulated users against real behavioral signatures to enhance training signals.
  • Develop and apply methods for trajectory generation guided by verification processes.
  • Conduct experiments to validate that synthetic data improves downstream model performance.
  • Extract process-reward models from multi-step interactions for better training outcomes.
  • Collaborate on integrating novel methods into production pipelines for LLM training.

What we're looking for

  • PhD candidate in Computer Science, Machine Learning, Computational Linguistics, or related field with deep learning specialization.
  • Research experience in generative modeling, synthetic data generation, LLM post-training, reward modeling, and interactive simulation.
  • Strong Python programming skills and hands-on experience with PyTorch and modern LLM training frameworks like HuggingFace.
  • Experience training and evaluating LLMs on real downstream tasks, including calibration of LLM-as-judge for subjective dimensions.
  • Prior work on user simulation grounded in real population data or cognitive science, focusing on behavioral modeling and agent–user interaction.
  • Contributions to open-source projects related to SDG, LLM training, or evaluation.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 825 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 813 roles with salary data.

Most-posted roles

View all roles at Nvidia