Feed productionLLMs and ML models.
Vector retrieval, feature stores, LLM batch enrichment, evaluation pipelines. The infra that makes AI actually work in production.
What you'll actually be able to do.
Not "you'll know about Airflow." What you'll ship, debug, and defend in an interview.
You’ll be the person who
- Build retrieval that actually returns the right thing
- Design feature stores with point-in-time correctness
- Run LLM batch enrichment with cost + retry control
- Evaluate retrieval with recall@k, not vibes
- Bridge the data team and the ML team
And the market pays you for
The system you'll build by the end.
A production reference architecture — not a toy demo. Every node maps to a course or project in this path.
From week one to capstone.
A realistic 5-stage timeline. Go faster if you already have pieces; slower if you're brand new.
- 01Week 1–2
DE basics (fast)
SQL, Airflow, containers — skip if you have it
- 02Week 3–6
Batch + streaming
The pipelines that feed ML
- 03Week 7–12
AI & Vectors
Embeddings, hybrid retrieval, evaluation
- 04Week 13–17
Feature stores
Feast, offline/online parity, drift
- 05Week 18–20
Ship capstone
Vector search + LLM enrichment on real data
Ship
One project, endlessly talkable.
Every path ends with a flagship capstone you'll ship, write up, and walk through in every interview loop.
The capstone that gets you the AI-era role.
What you’ll ship
- 01Index 5M docs with bge-m3 embeddings
- 02Hybrid BM25 + ANN retrieval
- 03Rerank top-100 → top-10 with cross-encoder
- 04Recall@k eval on a human-labeled set
- 05API served behind FastAPI + Redis cache
Questions you'll confidently answer.
These are real interview questions for AI Data Engineer roles. If you can answer all four with a story from your capstone, you're ready.
How would you evaluate whether retrieval is getting better?
Why not just use cosine similarity? When does BM25 still win?
Design a feature store with point-in-time correctness
How do you batch-enrich 10M rows with an LLM on a budget?
Stack you'll learn.
Not memorized — operated. Each tool is taught inside a project, not an isolated lecture.
Start building your first system — today.
Module 01 is free. No card. Ship something real this weekend.