Synthetic Data Generation and User Simulation PhD Research Intern — Fall 2026

Nvidia

Remote Actively hiring Posted this week

Canada · Santa Clara, CA Posted 5 days ago

View original post Log in to save

At a glance

AI generated

TL;DR

As a PhD-level researcher joining our cutting-edge team focused on modern model development challenges, you will delve into advanced techniques for generative models and artificial data creation to enhance the training of large language models (LLMs). Your daily tasks include crafting high-fidelity synthetic data through behavioral calibration of simulated users against real signatures, procedural generation of probe scenarios, and trajectory synthesis guided by verification. You will also collaborate with other experts to integrate these innovative methods into production pipelines and validate their impact on downstream model performance. Essential skills for this role include expertise in deep learning frameworks like PyTorch, proficiency in Python, and experience with HuggingFace and vLLM. Ideal candidates have a background in generative modeling, synthetic data generation, or LLM post-training techniques, along with research contributions to top-tier AI conferences.

Skills

Python PyTorch HuggingFace vLLM Distributed Training Generative Modeling Synthetic Data Generation LLM Post-Training Reward Modeling Multi-Agent Simulation Behavioral Modeling Deep Learning NLP Large-Scale Data Curation CI/CD

What you'll do

Research innovative techniques in generative models and artificial data creation for LLM training.
Craft high-fidelity synthetic user simulations calibrated against real behavioral signatures.
Develop methods to procedurally generate probe coverage and trajectory synthesis guided by verification.
Conduct experiments to validate that synthetic data improves downstream model performance metrics.
Integrate novel methods into production training pipelines in collaboration with engineering teams.

What we're looking for

PhD candidate in Computer Science, Machine Learning, Computational Linguistics, or related field with deep learning specialization.
Research experience in generative modeling, synthetic data generation, LLM post-training, reward modeling, and interactive simulation.
Proficient in Python programming and deep learning frameworks like PyTorch and HuggingFace.
Published research at top-tier AI, ML, or NLP conferences.
Experience training and evaluating large language models on real-world tasks.
Background in user simulation, behavioral modeling grounded in real population data, and multilingual/low-resource evaluation.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 825 open roles on FindRole.

Listed pay typically runs $184,000–$287,500 across 813 roles with salary data.

Most-posted roles

View all roles at Nvidia