Lead Data Scientist, Gen AI for Condition Monitoring Analytics

Caterpillar

Quick summary

Work type
On-site
Location
Chicago, ILPeoria, IL
Salary
$128,470–$208,770 / yr
Posted
3 days ago
Closes
Jul 9, 2026

Market check

Salary context

Competitive pay

How this pay compares to similar roles

Similar $181k
This role $169k
$117k most similar roles pay here $237k

This role pays less than 61% of similar roles. Most pay $147,568–$215,350 — the shaded band above. At the midpoint, this role pays about $169k versus about $181k for comparable roles.

Based on 240 similar postings.

Employer

About Caterpillar

Caterpillar Inc. is the world''s largest manufacturer of construction and mining equipment, diesel and natural gas engines, industrial gas turbines, and diesel-electric locomotives. Industry: Heavy Equipment & Manufacturing

Caterpillar currently has 46 open roles on FindRole.

Listed pay typically runs $128,470–$208,770 across 45 roles with salary data.

Most-posted roles

View all roles at Caterpillar

At a glance

TL;DR · Lead Data Scientist, Gen AI for Condition Monitoring Analytics

The Lead Data Scientist role at Caterpillar Inc.’s Cat Digital group involves leading the development and integration of digital twins for condition monitoring and predictive analytics using advanced AI techniques. This technical expert will design anomaly detection models, optimize algorithms for NVIDIA GPUs, and collaborate with engineering teams to ensure seamless hardware-software co-design for onboard systems. Key responsibilities include developing generative AI agents that synthesize telematics data for prioritized repairs, integrating multi-modal outputs from condition monitoring analytics, and providing monthly updates to stakeholders. The ideal candidate has expertise in Python programming, machine learning frameworks like XGBoost and autoencoders, and experience with high-frequency IoT sensor data and CAN bus protocols. They should also be proficient in cloud technologies, version control systems, and working in an Agile environment. This role is pivotal in leveraging massive telematics datasets to enhance product performance and customer outcomes at a global scale.

What you'll do

  • Design and implement GPU-accelerated machine learning models for anomaly detection in sensor data.
  • Develop onboard digital twins using NVIDIA architecture to predict and optimize heavy machinery performance.
  • Profile and tune deep learning algorithms for efficiency on NVIDIA GPUs, ensuring real-time monitoring capabilities.
  • Adapt and test algorithms for onboard edge processing on Caterpillar equipment using tools like NVIDIA Jetson.
  • Use high-fidelity digital twins to simulate rare failure scenarios for accurate troubleshooting with AI assistants.

What we're looking for

  • Extensive experience in machine learning and advanced data analysis using Python and statistical methods.
  • Deep understanding of anomaly detection, time-series analysis, and predictive maintenance models for condition monitoring.
  • Proficiency in fine-tuning Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG).
  • Experience handling high-frequency IoT sensor data and integrating with unified data platforms.
  • Strong background in high-performance computing and GPU-accelerated algorithms on NVIDIA architecture.
  • Working knowledge of cloud technologies, version control systems, and Agile development methodologies.
  • Bachelor’s, Master’s, or PhD degree in a relevant technical field such as Data Science or Engineering.

More like this

Similar roles

Lead AI Engineer, Data Solutions

Salesforce

Remote (San Francisco, CA) +3 30 days ago $172,500$260,100
Python ML models LLMs APIs Spark Airflow Dagster Snowflake BigQuery A/B testing CI/CD Prometheus Grafana Kubernetes AWS Terraform
Remote

Senior Data Engineer, AI & Analytics Infrastructure

IBM

Chicago, IL 28 days ago
Azure AWS Databricks Azure Synapse Analytics Microsoft Fabric Snowflake Azure Data Factory Azure Data Lake AWS S3 AWS Glue ETL ELT CI/CD Data Governance Metadata Management Event Hubs MLOps Feature Engineering Infrastructure-as-Code

Senior Data Engineer, AI & Analytics Infrastructure

IBM

Dallas, TX 28 days ago
Azure AWS Databricks Azure Synapse Analytics Snowflake Microsoft Fabric Azure Data Factory Azure Data Lake AWS S3 AWS Glue ETL ELT CI/CD Data Governance MLOps Event Hubs Metadata Management Observability Monitoring Python SQL

Lead Data Scientist, Document AI

CVS Health

Remote (New York, NY) 6 days ago $142,140$284,280
Python SQL Machine Learning Statistical Analysis Predictive Modeling Data Lineage Traceability Explainability CI/CD Healthcare Industry Knowledge Advanced Analytics Tools Large Data Set Analysis Multiple Data Sources Integration
Remote

Applied AI Data Scientist

The Hartford

Hartford, CT +3 19 days ago $90,160$135,240
Python SQL PyTorch TensorFlow scikit-learn RAG pipelines Vector search Agentic AI Vertex AI SageMaker Bedrock OpenAI Snowflake BERTScore BLEURT GCP AWS Semantic similarity Retrieval precision/recall