Skip to content
5 career paths · 30+ production projects

Data engineeringrewritten for the AI era.

30+ production projects across 5 career paths. Free first module on every career path. One platform — built the way the job actually looks in 2026.

Compare 5 career paths
5
career paths · pick yours
30+
production projects
Free
first module on every career path
2026
AI-era curriculum

Definition

What is AI-DE?

AI-DE is a hands-on learning platform for data engineers and AI engineers. It teaches how to build production-grade data pipelines, streaming systems, and AI platforms using tools like Spark, dbt, Kafka, and LLM frameworks. Unlike traditional courses, AI-DE focuses on real-world system design, production workflows, and end-to-end project experience.

A data pipeline is a system that ingests, transforms, and delivers data for analytics or applications. In AI-DE, you build pipelines using tools like Spark, dbt, and Airflow — then progress to streaming systems with Kafka and AI platforms with RAG and LLM frameworks.

Built by engineers shipping production data systems
Curriculum aligned with 2026 hiring loops
30+ capstone projects · resume- and interview-ready
Updated quarterly · current stack
The gap

You don't need more tutorials. You need something to ship.

Most courses teach isolated tools and hand you a completion certificate. ai-de.net is built the opposite way — around the systems you'll be asked to ship on day one, and the story you'll tell in the interview.

01

Tutorials don't survive the interview

Reading about Spark ≠ operating Spark. Hiring managers want to hear about a pipeline you owned when it broke.

02

Isolated skills don't ship pipelines

Knowing Airflow without dbt, Kafka without schema registries, or vectors without retrieval evaluation — that's how the role gets stuck at junior.

03

Generic courses ignore the AI era

2026 DE roles expect feature stores, LLM enrichment, and vector retrieval — not just a batch job from 2019.

Find yourself

Where you are now → where you want to be.

Pick the line that sounds most like you. We'll point you at the path that fits — and the first project to ship.

Junior / aspiring DEMost popular path · broadest market

I'm new to data

You know SQL and a bit of Python. You want a real first DE job, with a portfolio that survives an interview loop.

Recommended pathCore Data Engineer
Analyst / AEClosest to the business · fastest hired

I'm an analyst → engineer

You live in the warehouse. You're ready to own dbt models, contracts, and the metric layer the business actually trusts.

Software → AI / dataFastest-growing AI-era specialty

I'm a SWE pivoting in

You write production code. You want the AI-era data role hiring managers can't fill fast enough — feature stores, RAG, vectors.

Recommended pathAI Data Engineer
Mid-level DESenior track · platform ownership

I want to level up to senior

You ship pipelines today. You're aiming at senior / staff roles owning platform decisions, on-call, and architecture.

Differentiator

One skill. One production project. Every time.

Every course is paired with a real, deployable project — something you'd actually build at a mid-sized company. Ship it, put it on your resume, talk through it in the interview.

P01Full

Flink fraud detection

Stateful streaming pipeline on Flink + Kafka — 5 keyed-state detectors, exactly-once via 2PC, Flink K8s Operator with ZK HA.

FlinkKafkaRocksDBKubernetes
P02Full

Uber Event Platform: Staff Design Portfolio

Staff-level system-design portfolio: redesign Uber's event platform, 10K → 1B events/day. 69 artifacts, no code.

System DesignKafkaIcebergFlinkSLA Engineering
P03Free phase 1

Commerce data warehouse

Kimball star schema with 22 dbt models — atomic + event facts, SCD2 via snapshots, incremental processing, and a GitHub Actions Slim CI gate.

dbtPostgresGitHub Actions
P04Full

Iceberg Lakehouse Foundations

Local ACID lakehouse on Iceberg + Nessie + MinIO. Bronze → Silver → Gold + maintenance.

IcebergSparkNessie
P05Full

ShopStream Spark Batch Pipeline

Spark + Delta Lake batch ETL with 4 documented optimization patterns (9x progression), ACID lakehouse, and a Kafka/K8s streaming overlay.

SparkDelta LakeKafkaKubernetes
P06Full

Enterprise RAG — retrieval-quality build

EXPERT-tier retrieval-quality RAG: 4-strategy chunking A/B (62/78/85%), hybrid BM25 + dense + RRF, cross-encoder reranker, RAGAS 4-metric canary, LLM gateway with fallback. 5 ADRs + cost-model CSV bundled.

FastAPIOpenAIPineconeQdrantRedisPrometheus
P07Full

PredictFlow — production MLOps platform with Feast + BentoML

EXPERT-tier MLOps build: MLflow + DVC + Feast (offline/online), BentoML on K8s with HPA + canary, Evidently drift + cron-gated retrain, Prometheus + Grafana. Modules 01-02 with PRO; full platform with EXPERT.

FeastMLflowBentoMLRedisKubernetesEvidently
P08Full

LLM evaluation framework — multi-judge cascade + recall@k gate

EXPERT-tier eval build: 3-judge cascade (Haiku → Sonnet → GPT-4o), variance-based agreement, recall@k regression gate in GitHub Actions, RAGAS scaffolding, online drift detection, 5 committed ADRs (one Deprecated), runnable cost-model CSV. 7 modules · 17-19h. Module 01 with PRO.

PydanticFastAPIAnthropicOpenAIGitHub Actions
P09Full

AI cost optimization (CostGuard)

Cost-aware LLM platform: token tracking, dual-tier cache, 4-strategy router, three-tier budget governance. 5 ADRs + cost-model CSV bundled.

FastAPIOpenAIRedisPostgresasyncpgPrometheus
P10Full

Data observability stack

Detect, trace, prevent: dbt + OpenLineage + Grafana on a pre-broken warehouse.

dbtOpenLineageGrafana
P11Full

Data governance & contracts

ODCS contracts, GE + Soda validation, Avro + Schema Registry PR gate, 4-tier PII + RBAC + hashed audit, SOC2 + GDPR engines.

ODCSAvroSchema RegistryGE + SodaGitHub Actions
P12Full

CI/CD data platform

Terraform + GitHub Actions + dbt CI — the platform under the platform.

TerraformGitHub Actionsdbt
Why ai-de vs alternatives

A training platform shaped like the job.

The difference isn't volume of videos. It's whether you end up with a thing to ship.

ai-de.netCourseraDataExpertUdemyDataCampYouTube
Course structure1 course = 1 production projectVideo lectures + auto-graded labsBootcamp cohorts + projectsVideo lecturesShort exercisesRaw theory
AI-era coverageVectors, LLM pipelines, feature stores, RAGGenAI specialisations (varies)AI-adjacent topicsRarely updatedBeginner ML onlyScattered
Portfolio outputResume-ready repo + system write-upCertificate + capstone repoBootcamp capstoneCertificateCompletion badgeNotes
Free first moduleEvery path · no card required7-day audit accessLimited free lessonsFree preview clipFirst lessonFree
Price for full access$29/mo or $299/yr$49–79 / mo · per spec$2,000+ · bootcamp tier$200–400 / yr · per course$29–49 / moFree
Built by working DEsActive practitioners shipping prodAndrew Ng et al · academicEx-FAANG instructorsMixedMixedVaries
Free resources

A free resource to grab before you sign up.

Drop an email and we'll send it. No drip-campaign spam — just the file.

Community

Code review and unstuck threads — coming soon.

We're standing up a Discord with project review, interview prep, and weekly office hours run by working data engineers. Drop your email above and we'll let you know when the doors open.

Q3 2026
Launch
Free
For all members
Weekly
Office hours
#project-reviewIllustrative · Q3 2026 launch
MK
maya.k2:14 PM
Finished the CDC pipeline project — hitting ~4k events/sec on a single node. Is this reasonable?
AR
alex (mentor)2:17 PM
Looks healthy. For interviews, focus on how you'd tune num.partitions and the trade-off with rebalance time — that's where I'd probe as an interviewer.
JT
jenna.t2:19 PM
Just used the feature-store project answer in a Stripe loop — got the offer 🎯
FAQ

Questions you're probably asking.

I'm a complete beginner — can I start here?
Yes. The Foundations track assumes you can run a terminal and know basic SQL. The 2-minute assessment will tell you honestly whether you should start there or with a prerequisite.
How is this different from a Udemy course?
Every course is paired with a production-grade project you build and ship. Udemy sells videos; we hand you a portfolio. Projects are designed to stand up to a real interview, not a quiz.
What does 'free first module' mean?
The first module of every career path is completely free — no card required. Ship the first project, evaluate whether the style fits you, then upgrade to Professional ($29/mo or $299/yr) for full access.
Will the projects actually hold up in interviews?
Each project ships with a one-page 'how to talk about this' write-up — architecture diagram, trade-offs, and the 5 follow-up questions interviewers love to ask. Built by engineers who've been on both sides of hiring loops.
Do you cover AI / LLM / vector data engineering?
Yes — extensively. The AI & Vectors track covers embeddings, hybrid retrieval, feature stores, LLM batch enrichment, and evaluation. This is the work 2026 DE roles ask about and most legacy courses skip entirely.
Is there a refund policy?
14 days, no questions. If you finish the first paid module and don't feel it's worth the price, we refund.
Your move

Pick a project.
Ship something real this week.

Free first module on every career path. No card. No drip-campaign spam. Just the files and a 2-minute assessment to tell you where to start.

Browse all 30+ projects
SHIPPED > WATCHED · BUILT > STREAMED · HIRED > CERTIFIED
Press Cmd+K to open