Production pipelines on
Airflow + dbt
— ingest, orchestrate, transform, monitor
End-to-end batch pipeline foundations on the open-source Airflow + dbt + Postgres + Docker stack. Build 3 production DAGs: an idempotent REST-API ingestion with pagination and retry, a multi-source orchestrator with TaskGroups + dynamic mapping + branching + callbacks, and a dbt medallion transformer (bronze → silver → gold) with dbt tests as quality gates. Wrap it with SLA monitoring, freshness checks, and a production-readiness runbook. Local Docker only — no cloud credentials.
The end-to-end batch pipeline question every junior+ data-engineering interview asks — pagination, idempotency, orchestration, transformation, monitoring. This project gives you the working code on the most-deployed open-source stack.
- ingest_orders_api.py — REST API DAG with pagination, retry/backoff, HttpSensor, idempotent ON CONFLICT UPSERT into staging.orders
- customer_orders_pipeline.py + multi_source_etl.py — TaskGroups, BranchPythonOperator, dynamic mapping with expand(), watermark/CDC, on_failure_callback
- streamcart_transforms/ — dbt project with bronze/silver/gold medallion, is_incremental() + unique_key, dbt tests + custom assertions, dbt docs
- dbt_medallion_pipeline.py — Airflow orchestrating dbt bronze→test→silver→test→gold→test→docs as quality-gated stages
- Monitoring layer — SLA monitoring, freshness checks, on_failure_callback, production-readiness checklist + runbook
- Seeded warehouse — 10K orders, 1K customers, 600 products, 100 audit records on local Postgres + Docker (Airflow + Redis + Celery executor)
Airflow + dbt is the open-source stack every paid platform copies.
Whether your company runs on Astronomer / MWAA / Composer or self-hosts Airflow, the primitives are the same. dbt + a warehouse is the same. This project builds the substrate every 'modern data stack' platform abstracts over — so you can reason about what's underneath the SaaS bill.
Airflow primitives > vendor magic
TaskFlow API, TaskGroups, dynamic mapping, branching, sensors, callbacks. The patterns transfer to MWAA / Composer / Astronomer with config-only changes.
Idempotency > rerun panic
ON CONFLICT UPSERT + watermark/CDC + execution-date keys mean you can re-run any DAG without fear of duplicates. The pattern that turns a tutorial into production.
dbt tests > runtime asserts
Quality gates as Airflow tasks — bronze→test→silver→test→gold→test→docs. Failures block downstream, not silently corrupt the warehouse.
Runbook > tribal knowledge
SLA + freshness + on_failure_callback + production-readiness checklist. Hand the project to a teammate and they can operate it.
Module 01 is free. The rest unlocks with PRO.
Try the first 2-3 hours — write the REST-API ingestion DAG with pagination, retry, and idempotent UPSERT into staging.orders. If the rhythm clicks, upgrade to unlock the orchestration patterns, dbt medallion, and monitoring modules.
Apache Airflow Production Patterns
The deep-dive curriculum on every Airflow primitive used in this project’s 3 DAGs. PRO subscribers get full access to every module.
Three sprints. Three checkpoints. One production-ready pipeline.
Each phase ends with a runnable DAG and a tagged commit. No theory decks.
ingest_orders_api.py running locally. Idempotent re-runs land cleanly. audit.pipeline_runs tracks every execution.
- ✓ingest_orders_api.py (TaskFlow + HttpSensor)
- ✓staging.orders + audit.pipeline_runs
- ✓ON CONFLICT UPSERT + retries + backoff
Multi-source ETL across API + Postgres + S3 mock. TaskGroups + dynamic mapping + branching + callbacks all wired and tested with pytest.
- ✓customer_orders_pipeline.py (watermark/CDC)
- ✓multi_source_etl.py (3-source parallel)
- ✓TaskGroups + expand() + on_failure_callback
dbt project with bronze/silver/gold + tests + docs, orchestrated from dbt_medallion_pipeline.py. SLA + freshness + runbook complete.
- ✓streamcart_transforms/ (bronze→silver→gold + tests)
- ✓dbt_medallion_pipeline.py (quality-gated)
- ✓SLA + freshness + runbook + architecture diagram
One stack. Airflow + Postgres + dbt — running in Docker.
Pre-configured docker-compose with Airflow (scheduler + webserver + Celery executor + Redis), Postgres, and the dbt project scaffolded. Seed data with 10K orders / 1K customers / 600 products / 100 audit records loads on first boot.
What lives in the repo
Everything you need to run all 3 production DAGs locally. Tutorial walks you through writing each DAG; the starter has working scaffolds, seed data, and the dbt project structure so you can boot fast and focus on the patterns.
- docker-compose.yml — Airflow (scheduler + webserver + Celery + Redis) + Postgres
- dags/streamcart/ — ingest_orders_api · customer_orders_pipeline · dbt_medallion_pipeline
- streamcart_transforms/ — dbt project with bronze/silver/gold models + tests + docs
- seed/ — 10K orders · 1K customers · 600 products · 100 audit records
- tests/ — pytest DagBag tests (cycles, defaults, load errors)
- requirements.txt + README.md — pinned versions + production-readiness checklist
Airflow + dbt Pipeline Starter Kit
Pre-configured docker-compose stack, the dbt project scaffolded, all 3 production DAG skeletons, seed data with 11.7K rows across 3 entities + audit log, and the pytest DagBag tests. Boot Airflow + Postgres + dbt in under 5 minutes.
The same pipeline — but built for the production case.
Tutorials show you the happy path. Production breaks at the edges. Each row pairs the tutorial pattern with the upgrade you make when the DAG is actually on-call for revenue dashboards at 9am.
Dataset + downstream blocked-on-fail policydbt source freshness + late-arriving-data window + reconciliation jobReal review from senior engineers who’ve shipped this stack.
Submit your repo, get line-by-line feedback within 48 hours. The kind of review that's quietly worth thousands of dollars in time-to-DE.
4 reviews / month
Submit a repo, a PR, or a refactor proposal. Reviewer is matched to your domain — Airflow + dbt for this project. Async, comments inline, average turnaround 31 hours.
2 office hours / month
Live 30-min sessions with a senior data engineer. Walk through your DAGs, debug an idempotency edge case, mock a system-design interview. Group sessions also available.
One subscription. 15+ projects, all curriculum, code review.
PRO is built for engineers who want production-grade builds and feedback loops — not more tutorials.
Pick this if you’re shipping pipelines, not learning to.
Junior data engineers
You know SQL and Python; you want to ship the canonical Airflow + dbt pipeline so you can talk about it in interviews and on a portfolio.
Analysts moving into DE
You live in dbt + a warehouse but you've never owned the Airflow side. This bridges that gap end-to-end with real DAG patterns.
Backend devs adding DE
You can ship services; you want to see how data folks structure batch — DAGs, idempotency, quality gates, runbooks. The on-call rhythm.
Career-switchers / bootcampers
You've done a few notebooks. This is the smallest realistic production pipeline that fits in a portfolio and survives a system-design conversation.
Going deeper? Three tracks back this pipeline.
This project stitches Airflow + dbt + Python into one pipeline. These three curriculums let you go deeper on each layer — the orchestration primitives, the transformation engine, and the Python ETL patterns the DAGs are built on.
Quick answers.
Ready to ship a real production pipeline?
Start with module 01 — free, no card. About 2-3 hours. By the end you'll have a production-shaped REST-API ingestion DAG running locally in Docker, with pagination, retry, HttpSensor, and idempotent ON CONFLICT UPSERT into staging.orders.