Skip to content
Path 02 · +3x YoY hiring

Feed productionLLMs and ML models.

Vector retrieval, feature stores, LLM batch enrichment, evaluation pipelines. The infra that makes AI actually work in production.

See the path
After this path

What you'll actually be able to do.

Not "you'll know about Airflow." What you'll ship, debug, and defend in an interview.

You’ll be the person who

  • Build retrieval that actually returns the right thing
  • Design feature stores with point-in-time correctness
  • Run LLM batch enrichment with cost + retry control
  • Evaluate retrieval with recall@k, not vibes
  • Bridge the data team and the ML team

And the market pays you for

Build retrievalNot vibes-based
Design feature storesPoint-in-time correct
Run LLM pipelinesWith cost + retries
Evaluate rigorouslyRecall@k, not vibes
System architecture

The system you'll build by the end.

A production reference architecture — not a toy demo. Every node maps to a course or project in this path.

01 · Source
Postgres
Events
Docs
02 · Embed
OpenAI
bge-m3
Cohere
03 · Store
pgvector
Qdrant
Feast
04 · Retrieve
Hybrid BM25+ANN
Rerank
05 · Serve
LLM API
Eval loop
Feedback
Orchestration: Ray + AirflowEvery node → a course + project
Your path

From week one to capstone.

A realistic 5-stage timeline. Go faster if you already have pieces; slower if you're brand new.

  1. 01Week 1–2

    DE basics (fast)

    SQL, Airflow, containers — skip if you have it

  2. 02Week 3–6

    Batch + streaming

    The pipelines that feed ML

  3. 03Week 7–12

    AI & Vectors

    Embeddings, hybrid retrieval, evaluation

  4. 04Week 13–17

    Feature stores

    Feast, offline/online parity, drift

  5. 05Week 18–20

    Ship capstone

    Vector search + LLM enrichment on real data

Capstone project

One project, endlessly talkable.

Every path ends with a flagship capstone you'll ship, write up, and walk through in every interview loop.

P06 · CapstoneEnterprise RAG platform

The capstone that gets you the AI-era role.

pgvectorQdrantFastAPIRedisRayOpenAI

What you’ll ship

  • 01Index 5M docs with bge-m3 embeddings
  • 02Hybrid BM25 + ANN retrieval
  • 03Rerank top-100 → top-10 with cross-encoder
  • 04Recall@k eval on a human-labeled set
  • 05API served behind FastAPI + Redis cache
Proof

Questions you'll confidently answer.

These are real interview questions for AI Data Engineer roles. If you can answer all four with a story from your capstone, you're ready.

Q1

How would you evaluate whether retrieval is getting better?

Q2

Why not just use cosine similarity? When does BM25 still win?

Q3

Design a feature store with point-in-time correctness

Q4

How do you batch-enrich 10M rows with an LLM on a budget?

Why this matters: Most courses let you hide behind passive video-watching. ai-de projects force you into the exact failure modes interviewers probe for — so when you sit in the interview, you’ve already lived the answer.
Skills · syllabus

Stack you'll learn.

Not memorized — operated. Each tool is taught inside a project, not an isolated lecture.

EmbeddingspgvectorQdrantFeastRayLLM APIsRetrieval evalSparkFastAPI
Your move

Start building your first system — today.

Module 01 is free. No card. Ship something real this weekend.

Compare all 5 paths
Press Cmd+K to open