5 career paths · 30+ production projects

Data engineeringrewritten for the AI era.

30+ production projects across 5 career paths. Free first module on every career path. One platform — built the way the job actually looks in 2026.

Compare 5 career paths

career paths · pick yours

30+

production projects

Free

first module on every career path

2026

AI-era curriculum

Definition

What is AI-DE?

AI-DE is a hands-on learning platform for data engineers and AI engineers. It teaches how to build production-grade data pipelines, streaming systems, and AI platforms using tools like Spark, dbt, Kafka, and LLM frameworks. Unlike traditional courses, AI-DE focuses on real-world system design, production workflows, and end-to-end project experience.

A data pipeline is a system that ingests, transforms, and delivers data for analytics or applications. In AI-DE, you build pipelines using tools like Spark, dbt, and Airflow — then progress to streaming systems with Kafka and AI platforms with RAG and LLM frameworks.

Learn SQL for Data Engineering Apache Spark Tutorial dbt Data Transformation Kafka Streaming Guide RAG Systems Guide Vector Databases

Built by engineers shipping production data systems

Curriculum aligned with 2026 hiring loops

30+ capstone projects · resume- and interview-ready

Updated quarterly · current stack

The gap

You don't need more tutorials. You need something to ship.

Most courses teach isolated tools and hand you a completion certificate. ai-de.net is built the opposite way — around the systems you'll be asked to ship on day one, and the story you'll tell in the interview.

Tutorials don't survive the interview

Reading about Spark ≠ operating Spark. Hiring managers want to hear about a pipeline you owned when it broke.

Isolated skills don't ship pipelines

Knowing Airflow without dbt, Kafka without schema registries, or vectors without retrieval evaluation — that's how the role gets stuck at junior.

Generic courses ignore the AI era

2026 DE roles expect feature stores, LLM enrichment, and vector retrieval — not just a batch job from 2019.

Find yourself

Where you are now → where you want to be.

Pick the line that sounds most like you. We'll point you at the path that fits — and the first project to ship.

Junior / aspiring DEMost popular path · broadest market

“I'm new to data”

You know SQL and a bit of Python. You want a real first DE job, with a portfolio that survives an interview loop.

Recommended pathCore Data Engineer

Analyst / AEClosest to the business · fastest hired

“I'm an analyst → engineer”

You live in the warehouse. You're ready to own dbt models, contracts, and the metric layer the business actually trusts.

Recommended pathAnalytics / Product Data Engineer

Software → AI / dataFastest-growing AI-era specialty

“I'm a SWE pivoting in”

You write production code. You want the AI-era data role hiring managers can't fill fast enough — feature stores, RAG, vectors.

Recommended pathAI Data Engineer

Mid-level DESenior track · platform ownership

“I want to level up to senior”

You ship pipelines today. You're aiming at senior / staff roles owning platform decisions, on-call, and architecture.

Recommended pathData Platform Engineer

Compare all 5 paths side-by-side

Differentiator

One skill. One production project. Every time.

Every course is paired with a real, deployable project — something you'd actually build at a mid-sized company. Ship it, put it on your resume, talk through it in the interview.

P01Full

Flink fraud detection

Stateful streaming pipeline on Flink + Kafka — 5 keyed-state detectors, exactly-once via 2PC, Flink K8s Operator with ZK HA.

FlinkKafkaRocksDBKubernetes

P02Full

Uber Event Platform: Staff Design Portfolio

Staff-level system-design portfolio: redesign Uber's event platform, 10K → 1B events/day. 69 artifacts, no code.

System DesignKafkaIcebergFlinkSLA Engineering

P03Free phase 1

Commerce data warehouse

Kimball star schema with 22 dbt models — atomic + event facts, SCD2 via snapshots, incremental processing, and a GitHub Actions Slim CI gate.

dbtPostgresGitHub Actions

P04Full

Iceberg Lakehouse Foundations

Local ACID lakehouse on Iceberg + Nessie + MinIO. Bronze → Silver → Gold + maintenance.

IcebergSparkNessie

P05Full

ShopStream Spark Batch Pipeline

Spark + Delta Lake batch ETL with 4 documented optimization patterns (9x progression), ACID lakehouse, and a Kafka/K8s streaming overlay.

SparkDelta LakeKafkaKubernetes

P06Full

Enterprise RAG — retrieval-quality build

EXPERT-tier retrieval-quality RAG: 4-strategy chunking A/B (62/78/85%), hybrid BM25 + dense + RRF, cross-encoder reranker, RAGAS 4-metric canary, LLM gateway with fallback. 5 ADRs + cost-model CSV bundled.

FastAPIOpenAIPineconeQdrantRedisPrometheus

P07Full

PredictFlow — production MLOps platform with Feast + BentoML

EXPERT-tier MLOps build: MLflow + DVC + Feast (offline/online), BentoML on K8s with HPA + canary, Evidently drift + cron-gated retrain, Prometheus + Grafana. Modules 01-02 with PRO; full platform with EXPERT.

FeastMLflowBentoMLRedisKubernetesEvidently

P08Full

LLM evaluation framework — multi-judge cascade + recall@k gate

EXPERT-tier eval build: 3-judge cascade (Haiku → Sonnet → GPT-4o), variance-based agreement, recall@k regression gate in GitHub Actions, RAGAS scaffolding, online drift detection, 5 committed ADRs (one Deprecated), runnable cost-model CSV. 7 modules · 17-19h. Module 01 with PRO.

PydanticFastAPIAnthropicOpenAIGitHub Actions

P09Full

AI cost optimization (CostGuard)

Cost-aware LLM platform: token tracking, dual-tier cache, 4-strategy router, three-tier budget governance. 5 ADRs + cost-model CSV bundled.

FastAPIOpenAIRedisPostgresasyncpgPrometheus

P10Full

Data observability stack

Detect, trace, prevent: dbt + OpenLineage + Grafana on a pre-broken warehouse.

dbtOpenLineageGrafana

P11Full

Data governance & contracts

ODCS contracts, GE + Soda validation, Avro + Schema Registry PR gate, 4-tier PII + RBAC + hashed audit, SOC2 + GDPR engines.

ODCSAvroSchema RegistryGE + SodaGitHub Actions

P12Full

CI/CD data platform

Terraform + GitHub Actions + dbt CI — the platform under the platform.

TerraformGitHub Actionsdbt

Curriculum

7 tracks. One coherent system.

Foundations on the left. Specialty on the right. Pick a path or move freely between tracks. 31 skills · 765h of video + labs · capstone project at the end of every track.

TRACK 01

Foundations

SQL, Python, data modeling, dbt, cloud basics. The ground every other track stands on.

5 skills68h4 free

TRACK 02

Batch pipelines

Airflow, Spark, Iceberg — the production workhorse of every modern data team.

4 skills334h0 free

TRACK 03

Streaming

Kafka, Flink, CDC, exactly-once. When minutes need to become milliseconds.

4 skills47h1 free

TRACK 04

Data quality

Contracts, observability, schema evolution, on-call response for data systems.

2 skills22h1 free

TRACK 05

AI & vectors

Embeddings, retrieval, feature stores, LLM batch enrichment, RAG infra.

10 skills212h1 free

TRACK 06

Platform

K8s, Terraform, multi-tenant Airflow, cost attribution, observability stack.

3 skills40h0 free

TRACK 07

Staff Engineering

System design, RFCs, ADRs, product thinking — the staff+ skills promotions actually reward.

3 skills43h0 free

Open the full curriculum

Why ai-de vs alternatives

A training platform shaped like the job.

The difference isn't volume of videos. It's whether you end up with a thing to ship.

	ai-de.net	Coursera	DataExpert	Udemy	DataCamp	YouTube
Course structure	1 course = 1 production project	Video lectures + auto-graded labs	Bootcamp cohorts + projects	Video lectures	Short exercises	Raw theory
AI-era coverage	Vectors, LLM pipelines, feature stores, RAG	GenAI specialisations (varies)	AI-adjacent topics	Rarely updated	Beginner ML only	Scattered
Portfolio output	Resume-ready repo + system write-up	Certificate + capstone repo	Bootcamp capstone	Certificate	Completion badge	Notes
Free first module	Every path · no card required	7-day audit access	Limited free lessons	Free preview clip	First lesson	Free
Price for full access	$29/mo or $299/yr	$49–79 / mo · per spec	$2,000+ · bootcamp tier	$200–400 / yr · per course	$29–49 / mo	Free
Built by working DEs	Active practitioners shipping prod	Andrew Ng et al · academic	Ex-FAANG instructors	Mixed	Mixed	Varies

Free resources

A free resource to grab before you sign up.

Drop an email and we'll send it. No drip-campaign spam — just the file.

The DE interview cheatsheet

47 system-design questions with senior+ framings

Markdown · 47 questions

Community

Code review and unstuck threads — coming soon.

We're standing up a Discord with project review, interview prep, and weekly office hours run by working data engineers. Drop your email above and we'll let you know when the doors open.

Q3 2026

Launch

Free

For all members

Weekly

Office hours

#project-reviewIllustrative · Q3 2026 launch

maya.k2:14 PM

Finished the CDC pipeline project — hitting ~4k events/sec on a single node. Is this reasonable?

alex (mentor)2:17 PM

Looks healthy. For interviews, focus on how you'd tune num.partitions and the trade-off with rebalance time — that's where I'd probe as an interviewer.

jenna.t2:19 PM

Just used the feature-store project answer in a Stripe loop — got the offer 🎯

FAQ

Questions you're probably asking.

I'm a complete beginner — can I start here?

Yes. The Foundations track assumes you can run a terminal and know basic SQL. The 2-minute assessment will tell you honestly whether you should start there or with a prerequisite.

How is this different from a Udemy course?

Every course is paired with a production-grade project you build and ship. Udemy sells videos; we hand you a portfolio. Projects are designed to stand up to a real interview, not a quiz.

What does 'free first module' mean?

The first module of every career path is completely free — no card required. Ship the first project, evaluate whether the style fits you, then upgrade to Professional ($29/mo or $299/yr) for full access.

Will the projects actually hold up in interviews?

Each project ships with a one-page 'how to talk about this' write-up — architecture diagram, trade-offs, and the 5 follow-up questions interviewers love to ask. Built by engineers who've been on both sides of hiring loops.

Do you cover AI / LLM / vector data engineering?

Yes — extensively. The AI & Vectors track covers embeddings, hybrid retrieval, feature stores, LLM batch enrichment, and evaluation. This is the work 2026 DE roles ask about and most legacy courses skip entirely.

Is there a refund policy?

14 days, no questions. If you finish the first paid module and don't feel it's worth the price, we refund.

Your move

Pick a project.
Ship something real this week.

Free first module on every career path. No card. No drip-campaign spam. Just the files and a 2-minute assessment to tell you where to start.

Browse all 30+ projects

SHIPPED > WATCHED · BUILT > STREAMED · HIRED > CERTIFIED