Most data-engineering roadmaps you’ll find online were written for the 2019 stack: a video on Spark, a video on Airflow, a video on Snowflake, done. The 2026 stack is materially different — and the curriculum below is built around what working production teams actually deploy this year, not what was fashionable five years ago.

The 7 tracks, and why the order matters

The 32 curricula are organized into 7 tracks: Foundations (SQL + Python + Data Modeling + Cloud), Data Systems (dbt + Spark + Airflow + Iceberg + Kafka + Flink), Quality + Governance (Observability + Contracts + Cost Optimization), Platform Engineering (CI/CD + IaC), Analytics (Metrics layer + Product Thinking), AI Systems (RAG + Vector DBs + LLM Pipeline + Agentic + Inference Serving + Feature Stores + MLOps + Dataset Engineering + Enterprise AI), and Leadership (Staff DE + System Design).

The recommended order is Foundations → Data Systems → either Analytics or AI Systems (depending on your target role) → Platform + Leadership. The reason: production AI systems sit on top of production data systems, and production data systems sit on top of SQL, Python, and dimensional modeling. Skipping foundations is the single most common reason engineers fail the senior-level system design round even when they can write Flink code.

Which curriculum should you start with?

If you can’t write a window function from memory, start with SQL Mastery (free). If you can write SQL but the words “Polars,” “Pydantic,” or “async I/O” are unfamiliar, start with Python for Data Engineers (free). If you have both but have never built a dbt project, start with dbt & Analytics Engineering. If you’re a working DE moving into AI, start with RAG → LLM Evaluation → Agentic Systems.

What “production-grade” means here

Every curriculum ends with a deployable artifact, not a quiz. The Spark curriculum ends with a Kubernetes-deployed PySpark job running with the Spark Operator + Prometheus metrics. The Airflow curriculum ends with a multi-DAG repo on the KubernetesExecutor with CI/CD and DAG-library versioning. The RAG curriculum ends with a hybrid retrieval system (BM25 + dense + RRF + cross-encoder reranker) benchmarked on recall@10. The shape is the same across all 32: read the ADR, deploy the system, break it, fix it, understand why the architectural decision went the way it did.

How the curriculum maps to interviews

Senior+ data-engineering interviews test three things: can you scope a problem before you draw boxes, can you defend a tradeoff on data you have, and can you talk about failure modes — not just the happy path. Every curriculum here is built around those behaviors. The system-design modules name the clarifying questions seniors ask. The cost-optimization modules teach the “reduce scan first, then add compute” ordering. The observability + governance modules teach what happens when a pipeline silently stops at 3am. That’s the gap between “I’ve done a Spark tutorial” and “I can defend a $300K Snowflake bill in front of finance.”

Below is the full catalog. Filter by track, tier, or hours to find your starting point.

31 shown

TRACK 01

Foundations

5 skills · 67.5h total · 4 free

SQL, Python, data modeling, dbt, cloud basics. The ground every other track stands on.

C-FND-1Foundations

free

SQL Mastery for Data Engineers

Window functions, CTEs, execution plans — SQL you'd pass at a FAANG screen.

8 lessons · 8.75h

C-FND-2Foundations

free

Python for Data Engineers

Production Python: typing, async, Pydantic, packaging — not a notebook tour.

Learn what production data teamsactually do.

How tracks build on each other.

Foundations

Batch pipelines

Streaming

Data quality

AI & vectors

Platform

Staff Engineering

Your first three skills.

SQL Mastery for Data Engineers

Python for Data Engineers

dbt & Analytics Engineering

The 32 skills production data teams actually use in 2026.

The 7 tracks, and why the order matters

Which curriculum should you start with?

What “production-grade” means here

How the curriculum maps to interviews

Foundations

SQL Mastery for Data Engineers

Python for Data Engineers

Advanced Data Modeling

Cloud Data Infra & FinOps

dbt & Analytics Engineering

Batch pipelines

Apache Spark Deep Dive

Apache Airflow

Apache Iceberg & Lakehouse

Warehouse Internals

Streaming

Real-Time Streaming Architecture

Kafka Streams

Flink & Stream Processing

Event Design & Contracts

Data quality

Data Observability & Quality

Governance & Data Contracts

AI & vectors

RAG Learning Path

Vector Databases

LLM Data Pipelines

LLM Evaluation

Feature Stores for ML

Agentic Workflows

AI Inference & Serving

Dataset Engineering

Enterprise GenAI & Security

MLOps for Data Engineers

Platform

DataOps: CI/CD & IaC

Cost Optimization for DEs

API & External Integration

Staff Engineering

Product Thinking for DEs

System Design for DEs

Staff Engineer Playbook

Not sure which skill to open first? We’ll pick.

Learn what production data teams
actually do.