Synthetic Data Generation and User Simulation PhD Research Intern — Fall 2026
At a glance
AI generatedTL;DR
As a senior researcher on our cutting-edge team, you will delve into developing advanced techniques for generative models and artificial data creation to enhance the training of large language models (LLMs). Your daily tasks include crafting high-fidelity synthetic data through behavioral calibration of simulated users against real-world signatures, procedural generation of probe scenarios, and trajectory synthesis guided by verification. You will collaborate with other experts to integrate these innovative methods into production pipelines and validate their impact on downstream model performance. Ideal candidates hold a PhD in Computer Science or related fields with expertise in deep learning, NLP, and LLM training, along with hands-on experience using Python, PyTorch, HuggingFace, and vLLM. This role requires a strong research background, including publications at top-tier AI conferences, and familiarity with multilingual and low-resource contexts.
Skills
What you'll do
- Research innovative techniques for high-fidelity synthetic data creation for LLMs.
- Calibrate simulated users against real behavioral signatures to enhance training signals.
- Develop and apply methods for trajectory generation guided by verification processes.
- Conduct experiments to validate that synthetic data improves downstream model performance.
- Extract process-reward models from multi-step interactions for better training outcomes.
- Collaborate on integrating novel methods into production pipelines for LLM training.
What we're looking for
- PhD candidate in Computer Science, Machine Learning, Computational Linguistics, or related field with deep learning specialization.
- Research experience in generative modeling, synthetic data generation, LLM post-training, reward modeling, and interactive simulation.
- Strong Python programming skills and hands-on experience with PyTorch and modern LLM training frameworks like HuggingFace.
- Experience training and evaluating LLMs on real downstream tasks, including calibration of LLM-as-judge for subjective dimensions.
- Prior work on user simulation grounded in real population data or cognitive science, focusing on behavioral modeling and agent–user interaction.
- Contributions to open-source projects related to SDG, LLM training, or evaluation.
Employer
About Nvidia
Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing
Nvidia currently has 825 open roles on FindRole.
Listed pay typically runs $184,000–$287,500 across 813 roles with salary data.
Most-posted roles
- Senior Solutions Architect, AI Infrastructure 4
- Senior System Software Engineer - AV Platform 4
- Senior Circuit Design Engineer 3
- Senior Circuit Methodology Engineer 3
- Senior Deep Learning Performance Architect 3