Lead Data Engineer - Assistant Vice President

Citi

Remote

Quick summary

Work type
Remote
Location
Irving, TX
Salary
$107,120–$160,680 / yr
Posted
74 days ago

Market check

Salary context

Below market

How this pay compares to similar roles

Similar $170k
This role $134k
$96k most similar roles pay here $214k

This role pays less than 77% of similar roles. Most pay $142,300–$198,000 — the shaded band above. At the midpoint, this role pays about $134k versus about $170k for comparable roles.

Based on 240 similar postings.

Employer

About Citi

Citi is one of the world’s most trusted financial institutions, proudly serving millions of customers across the United States.

Citi currently has 391 open roles on FindRole.

Listed pay typically runs $125,760–$188,640 across 361 roles with salary data.

Most-posted roles

View all roles at Citi

At a glance

TL;DR · Lead Data Engineer - Assistant Vice President

The Lead Data Engineer role at the Technology team is an intermediate position focusing on establishing and implementing new or revised application systems. This involves serving as a coach to junior analysts while building robust data pipelines using Apache Beam or Spark within the Hadoop ecosystem. The ideal candidate will have hands-on experience with AWS, Azure, or GCP, proficient programming skills in Python, and expertise in DevOps practices including continuous integration and containerization technologies like Docker and Kubernetes. Familiarity with modern CI/CD tools such as Jenkins is beneficial. This role requires a strong understanding of big data technologies and agile methodologies to tackle complex business problems efficiently.

What you'll do

  • Design and implement data pipelines using Apache Beam or Spark.
  • Provide technical guidance to junior analysts on Big Data projects.
  • Manage Hadoop ecosystem tools including Hive, Pig, Impala, and Kafka.
  • Develop automated deployment processes for cloud technologies like AWS.
  • Optimize data structures and algorithms for efficient distributed computing.

What we're looking for

  • 5+ years of experience with Hadoop and cloud technologies like AWS, Azure, or GCP.
  • Expertise in building data pipelines using Apache Beam or Spark.
  • Proficiency in Python programming and familiarity with DevOps practices.
  • Hands-on experience with containerization tools such as Docker and Kubernetes.
  • Strong understanding of the Hadoop ecosystem including HDFS, Hive, Pig, Impala, etc.
  • Knowledge of agile development methodologies (Scrum) and system-level concepts.

More like this

Similar roles

Data Engineer - Assistant Vice President

Citi

Remote (3800 Citigroup Center Drive Building C Tampa, US) 13 days ago $96,960$145,440
Python Java Scala Hadoop Snowflake Databricks SQL Kubernetes AWS Google Cloud Terraform Spark Kafka Airflow DBT CloudFormation CI/CD
Remote

Sr. Data Engineer - Assistant Vice President

Citi

Remote (Irving, TX) 20 days ago
Hadoop Spark Kafka Hive Parquet Avro Python Scala Java Databricks Microservices AI ML Deep Learning NLP SQL Docker Kubernetes Data Mesh Starburst
Remote

SR. Data Engineer - Assistant Vice President

Citi

Remote (Irving, TX) 20 days ago
Hadoop Spark Kafka Hive Python Scala Java Databricks ETL ELT Microservices AI ML DeepLearning NLP SQL Docker Kubernetes AWS Azure GCP DataMesh Starburst
Remote

Big Data Tech Delivery Lead - Senior Vice President

Citi

Remote (3800 Citigroup Center Drive Building G Tampa, US) 147 days ago $141,440$212,160
Java Hadoop Spark Kafka Elasticsearch Terraform CI/CD Python PostgreSQL Docker AWS Azure Google Cloud Platform Git Jenkins Prometheus Grafana Scrum Agile
Remote

Senior Data Developer Tech Lead - Vice President

Citi

Remote (Irving, TX) 32 days ago $125,760$188,640
Hadoop Apache Kafka Python PySpark SQL AWS Azure Google Cloud AI/ML MLOps CI/CD Generative AI Spark Unix Data Vault ETL ELT Dimensional Modeling Big Data Real-time Analytics
Remote

Senior Data Developer Tech Lead - Vice President

Citi

Remote (Irving, TX) 32 days ago $125,760$188,640
Hadoop Apache Kafka Python PySpark SQL AWS Azure Google Cloud AI/ML MLOps CI/CD Generative AI Spark Unix Data Vault ETL ELT Dimensional Modeling Big Data Real-time Analytics
Remote