Senior Scientist, Synthetic Data Generation

Nvidia

Remote

Quick summary

Work type
Remote
Location
Santa Clara, CANew York, NY
Salary
$168,000–$264,500 / yr
Posted
5 days ago

Market check

Salary context

Above market

How this pay compares to similar roles

Similar $168k
This role $216k
$99k most similar roles pay here $282k

This role pays more than 78% of similar roles. Most pay $126,800–$209,000 — the shaded band above. At the midpoint, this role pays about $216k versus about $168k for comparable roles.

Based on 240 similar postings.

Employer

About Nvidia

Nvidia is a leading designer of graphics processing units (GPUs) and system-on-chip units, powering gaming, professional visualization, data centers, and artificial intelligence workloads. Industry: Semiconductors & AI Computing

Nvidia currently has 994 open roles on FindRole.

Listed pay typically runs $168,000–$270,250 across 977 roles with salary data.

Most-posted roles

View all roles at Nvidia

At a glance

TL;DR · Senior Scientist, Synthetic Data Generation

As a Senior Scientist at NVIDIA, you will join the cutting-edge research team focused on advancing large language models through synthetic data generation. Your daily work involves building sophisticated pipelines using LLM-based methods to create high-quality datasets that enhance pre- and post-training processes for models like Nemotron. You’ll collaborate closely with various teams to develop open-source libraries within the NeMo ecosystem, ensuring they are well-documented and accessible. Key responsibilities include advancing multimodal synthetic data generation techniques, driving software excellence through modern tooling, and publishing research at top AI conferences. Ideal candidates hold a PhD in Computer Science or related fields, have extensive experience in synthetic data generation and generative modeling, and possess strong publication records and open-source contributions. Experience with multimodal understanding and scalable data pipelines is highly valued.

What you'll do

  • Build synthetic data generation pipelines using LLM-based methods to enhance pre- and post-training of models like Nemotron.
  • Advance multimodal synthetic data generation techniques for images, documents, videos, and audio in collaboration with model teams.
  • Design and maintain open-source libraries with clean APIs and strong documentation within the NVIDIA NeMo ecosystem.
  • Publish original research at top machine learning and AI conferences to uphold technical leadership.
  • Mentor interns and junior researchers to foster technical growth within the team.

What we're looking for

  • PhD in Computer Science, Machine Learning, Statistics, or related field with 3+ years research experience.
  • Expertise in synthetic data generation, generative modeling, and multimodal machine learning.
  • Deep understanding of LLMs and their pre/post-training processes using inference frameworks.
  • Proven track record developing software libraries used by broad developer communities.
  • Strong publication history at top AI conferences like NeurIPS, ICML, ICLR, ACL.
  • Experience with multimodal data generation and scalable data pipeline optimization.

More like this

Similar roles

Senior Scientist, Synthetic Data and Privacy

Nvidia

Remote (Santa Clara, CA) 5 days ago $168,000$264,500
Python LLM NLP Synthetic Data Generation Anonymization Git CI/CD vLLM TGI Docker Kubernetes PostgreSQL TensorFlow PyTorch GitHub GDPR CCPA
Remote

Senior, Data Scientist

Walmart

Seattle, WA +1 62 days ago $108,000$216,000
Python SQL Java machine learning data visualization feature selection model tuning scalable data storage data ecosystems data quality standards CI/CD

Senior, Data Scientist

Walmart

Seattle, WA 59 days ago $108,000$216,000
Python SQL Java machine learning data visualization data ecosystems data quality standards scalable data storage solutions CI/CD AWS Kubernetes

Data Scientist, Senior

Qualcomm

San Diego, CA 116 days ago
Python AWS Azure GCP SQL NoSQL LLMs LangChain Keras PyTorch TensorFlow scikit-learn APIs CI/CD Prometheus Grafana

Data Scientist, Senior

Booz Allen Hamilton

Washington, DC +1 67 days ago $99,000$225,000
Python R Kibana Tableau Power BI PyTorch TensorFlow Keras

Data Scientist, Senior

Booz Allen Hamilton

Alexandria, VA 49 days ago $99,000$225,000
SQL APIs Python Flask JavaScript Apache Spark Grafana Machine Learning Supervised Learning Unsupervised Learning Clustering Classification Dimensionality Reduction