Lead Data Scientist - Gen AI & Digital Twin

Caterpillar

Closes in 5 days

Quick summary

Work type
On-site
Location
Chicago, Illinois
Salary
$128,470–$208,770 / yr
Posted
2 days ago
Closes
Jun 11, 2026 (soon)

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $169k
This role $169k
$99k most similar roles pay here $238k

This role pays more than 56% of similar roles. Most pay $126,800–$211,200 — the shaded band above. At the midpoint, this role pays about $169k versus about $169k for comparable roles.

Based on 240 similar postings.

Employer

About Caterpillar

Caterpillar Inc. is the world''s largest manufacturer of construction and mining equipment, diesel and natural gas engines, industrial gas turbines, and diesel-electric locomotives. Industry: Heavy Equipment & Manufacturing

Caterpillar currently has 36 open roles on FindRole.

Listed pay typically runs $128,470–$208,770 across 36 roles with salary data.

Most-posted roles

View all roles at Caterpillar

At a glance

TL;DR · Lead Data Scientist - Gen AI & Digital Twin

The Lead Data Scientist role at Caterpillar Inc.’s Cat Digital group involves driving the development and integration of digital twins and GenAI-assisted predictive analytics for condition monitoring of heavy machinery. This senior position requires expertise in anomaly detection, digital twin engineering, and optimization using advanced machine learning models like XGBoost and autoencoders on NVIDIA GPUs. The candidate will work closely with hardware engineers to ensure algorithm compatibility with next-generation processors and develop simulation-based training for rare failure scenarios. Key skills include proficiency in Python, experience with high-frequency IoT sensor data, and knowledge of cloud technologies. This role focuses on leveraging massive telematics datasets to enhance product performance and customer outcomes at a global scale.

What you'll do

  • Design and implement GPU-accelerated machine learning models for anomaly detection in timeseries sensor data.
  • Develop onboard digital twins using NVIDIA architecture to simulate and optimize heavy machinery performance.
  • Profile and tune deep learning algorithms for efficiency on NVIDIA GPUs, ensuring real-time monitoring capabilities.
  • Adapt and test algorithms for onboard edge processing architectures like NVIDIA Jetson for Cat equipment.
  • Use high-fidelity digital twins to simulate rare failure scenarios for accurate troubleshooting with GenAI assistants.
  • Develop Generative AI agents that synthesize telematics data to generate prioritized repair recommendations for machine faults.

What we're looking for

  • Extensive experience in Python programming for data analysis and machine learning.
  • Deep understanding of anomaly detection, time-series analysis, and predictive maintenance models.
  • Proficiency in fine-tuning and prompt engineering for large language models (LLMs).
  • Experience with high-performance computing on NVIDIA GPU architectures.
  • Strong background in advanced data analysis techniques and statistical methods.
  • Working knowledge of cloud technologies and version control systems like GitHub.
  • Bachelor’s or higher degree in a relevant technical field such as Data Science or Engineering.

More like this

Similar roles

Senior Data Scientist, Gen AI Application

Genentech

South San Francisco, CA 16 days ago $177,310$329,290
LangChain LangGraph AWS Agentcore MCP A2A React Angular Figma SQL Prometheus Grafana PostgreSQL Vector/Graph Databases RAG Architecture CI/CD Python Go Docker
Hybrid

Senior Director, Data Science & Gen AI

Blue Cross Blue Shield Association (BCBSA)

Chicago, IL 58 days ago $215,000$295,000
Python R SQL AWS MLOps CI/CD Scrum Kubeflow MLFlow Snowflake Databricks Postgres NLP LLM Agile Bayesian inference Supervised learning Unsupervised learning Deep learning FHIR HL7
Hybrid

Senior Data Scientist, Gen AI Foundation

Genentech

South San Francisco, CA 16 days ago $177,310$329,290
RAG LangChain LangGraph AWS Agentcore MCP A2A Vector databases Graph databases SQL AI-generated SQL Observability tools Multimodal architectures Latency optimization Cost optimization Reliability engineering CI/CD Python PostgreSQL
Hybrid

Lead Data Scientist - Document AI

CVS Health

Remote (New York-161 Ave Of The Americas, US) 10 days ago $142,140$284,280
Python SQL Machine Learning Statistical Analysis Predictive Modeling Data Lineage Traceability Explainability CI/CD Healthcare Industry Knowledge Large Data Set Analysis Multiple Data Sources MLOps
Remote

Data Scientist Lead

PNC

Pittsburgh, PA 2 days ago $80,000$209,300
Python SQL R Apache Spark Hadoop TensorFlow Scikit-learn Keras Pandas Numpy Machine Learning Data Mining Data Science CI/CD Git Jupyter Notebook AWS Google Cloud Platform Azure