Data engineeringrewritten for the AI era.
30+ production projects across 5 career paths. Free first module on every career path. One platform — built the way the job actually looks in 2026.
Definition
What is AI-DE?
AI-DE is a hands-on learning platform for data engineers and AI engineers. It teaches how to build production-grade data pipelines, streaming systems, and AI platforms using tools like Spark, dbt, Kafka, and LLM frameworks. Unlike traditional courses, AI-DE focuses on real-world system design, production workflows, and end-to-end project experience.
A data pipeline is a system that ingests, transforms, and delivers data for analytics or applications. In AI-DE, you build pipelines using tools like Spark, dbt, and Airflow — then progress to streaming systems with Kafka and AI platforms with RAG and LLM frameworks.
You don't need more tutorials. You need something to ship.
Most courses teach isolated tools and hand you a completion certificate. ai-de.net is built the opposite way — around the systems you'll be asked to ship on day one, and the story you'll tell in the interview.
Tutorials don't survive the interview
Reading about Spark ≠ operating Spark. Hiring managers want to hear about a pipeline you owned when it broke.
Isolated skills don't ship pipelines
Knowing Airflow without dbt, Kafka without schema registries, or vectors without retrieval evaluation — that's how the role gets stuck at junior.
Generic courses ignore the AI era
2026 DE roles expect feature stores, LLM enrichment, and vector retrieval — not just a batch job from 2019.
Where you are now → where you want to be.
Pick the line that sounds most like you. We'll point you at the path that fits — and the first project to ship.
“I'm new to data”
You know SQL and a bit of Python. You want a real first DE job, with a portfolio that survives an interview loop.
“I'm an analyst → engineer”
You live in the warehouse. You're ready to own dbt models, contracts, and the metric layer the business actually trusts.
“I'm a SWE pivoting in”
You write production code. You want the AI-era data role hiring managers can't fill fast enough — feature stores, RAG, vectors.
“I want to level up to senior”
You ship pipelines today. You're aiming at senior / staff roles owning platform decisions, on-call, and architecture.
One skill. One production project. Every time.
Every course is paired with a real, deployable project — something you'd actually build at a mid-sized company. Ship it, put it on your resume, talk through it in the interview.
Flink fraud detection
Stateful streaming pipeline on Flink + Kafka — 5 keyed-state detectors, exactly-once via 2PC, Flink K8s Operator with ZK HA.
Uber Event Platform: Staff Design Portfolio
Staff-level system-design portfolio: redesign Uber's event platform, 10K → 1B events/day. 69 artifacts, no code.
Commerce data warehouse
Kimball star schema with 22 dbt models — atomic + event facts, SCD2 via snapshots, incremental processing, and a GitHub Actions Slim CI gate.
Iceberg Lakehouse Foundations
Local ACID lakehouse on Iceberg + Nessie + MinIO. Bronze → Silver → Gold + maintenance.
ShopStream Spark Batch Pipeline
Spark + Delta Lake batch ETL with 4 documented optimization patterns (9x progression), ACID lakehouse, and a Kafka/K8s streaming overlay.
Enterprise RAG — retrieval-quality build
EXPERT-tier retrieval-quality RAG: 4-strategy chunking A/B (62/78/85%), hybrid BM25 + dense + RRF, cross-encoder reranker, RAGAS 4-metric canary, LLM gateway with fallback. 5 ADRs + cost-model CSV bundled.
PredictFlow — production MLOps platform with Feast + BentoML
EXPERT-tier MLOps build: MLflow + DVC + Feast (offline/online), BentoML on K8s with HPA + canary, Evidently drift + cron-gated retrain, Prometheus + Grafana. Modules 01-02 with PRO; full platform with EXPERT.
LLM evaluation framework — multi-judge cascade + recall@k gate
EXPERT-tier eval build: 3-judge cascade (Haiku → Sonnet → GPT-4o), variance-based agreement, recall@k regression gate in GitHub Actions, RAGAS scaffolding, online drift detection, 5 committed ADRs (one Deprecated), runnable cost-model CSV. 7 modules · 17-19h. Module 01 with PRO.
AI cost optimization (CostGuard)
Cost-aware LLM platform: token tracking, dual-tier cache, 4-strategy router, three-tier budget governance. 5 ADRs + cost-model CSV bundled.
Data observability stack
Detect, trace, prevent: dbt + OpenLineage + Grafana on a pre-broken warehouse.
Data governance & contracts
ODCS contracts, GE + Soda validation, Avro + Schema Registry PR gate, 4-tier PII + RBAC + hashed audit, SOC2 + GDPR engines.
CI/CD data platform
Terraform + GitHub Actions + dbt CI — the platform under the platform.
7 tracks. One coherent system.
Foundations on the left. Specialty on the right. Pick a path or move freely between tracks. 31 skills · 765h of video + labs · capstone project at the end of every track.
Foundations
SQL, Python, data modeling, dbt, cloud basics. The ground every other track stands on.
Batch pipelines
Airflow, Spark, Iceberg — the production workhorse of every modern data team.
Streaming
Kafka, Flink, CDC, exactly-once. When minutes need to become milliseconds.
Data quality
Contracts, observability, schema evolution, on-call response for data systems.
AI & vectors
Embeddings, retrieval, feature stores, LLM batch enrichment, RAG infra.
Platform
K8s, Terraform, multi-tenant Airflow, cost attribution, observability stack.
Staff Engineering
System design, RFCs, ADRs, product thinking — the staff+ skills promotions actually reward.
A training platform shaped like the job.
The difference isn't volume of videos. It's whether you end up with a thing to ship.
| ai-de.net | Coursera | DataExpert | Udemy | DataCamp | YouTube | |
|---|---|---|---|---|---|---|
| Course structure | 1 course = 1 production project | Video lectures + auto-graded labs | Bootcamp cohorts + projects | Video lectures | Short exercises | Raw theory |
| AI-era coverage | Vectors, LLM pipelines, feature stores, RAG | GenAI specialisations (varies) | AI-adjacent topics | Rarely updated | Beginner ML only | Scattered |
| Portfolio output | Resume-ready repo + system write-up | Certificate + capstone repo | Bootcamp capstone | Certificate | Completion badge | Notes |
| Free first module | Every path · no card required | 7-day audit access | Limited free lessons | Free preview clip | First lesson | Free |
| Price for full access | $29/mo or $299/yr | $49–79 / mo · per spec | $2,000+ · bootcamp tier | $200–400 / yr · per course | $29–49 / mo | Free |
| Built by working DEs | Active practitioners shipping prod | Andrew Ng et al · academic | Ex-FAANG instructors | Mixed | Mixed | Varies |
A free resource to grab before you sign up.
Drop an email and we'll send it. No drip-campaign spam — just the file.
Code review and unstuck threads — coming soon.
We're standing up a Discord with project review, interview prep, and weekly office hours run by working data engineers. Drop your email above and we'll let you know when the doors open.
num.partitions and the trade-off with rebalance time — that's where I'd probe as an interviewer.Questions you're probably asking.
I'm a complete beginner — can I start here?
How is this different from a Udemy course?
What does 'free first module' mean?
Will the projects actually hold up in interviews?
Do you cover AI / LLM / vector data engineering?
Is there a refund policy?
Pick a project.
Ship something real this week.
Free first module on every career path. No card. No drip-campaign spam. Just the files and a 2-minute assessment to tell you where to start.