Build a Production Data Pipeline with Airflow + dbt
You joined a growing company that pulls data from external APIs every day. The current process is manual, unreliable, and dashboards are often late or wrong. Your job: automate ingestion, orchestrate transformations, add monitoring, and ship a production-ready pipeline.
Fig 1.1: Airflow + dbt Pipeline — 3 DAGs, all green
What You'll Build
3 Production DAGs
API ingestion with pagination, multi-step orchestration, and dbt transformation — mission-driven, not a demo.
Medallion Architecture
Orchestrate dbt bronze/silver/gold transformations with incremental processing and data-aware scheduling.
Production Monitoring
SLA checks, freshness monitoring, Slack alerting, and on_failure_callback patterns.
Portfolio-Ready Runbook
Ship with an architecture diagram, DAG screenshots, system design summary, and deployment runbook.
Progressive Build Path
Ingest → Orchestrate → Transform → Ship. Each part builds on the last. Complete in ~6–8 hours.
Spin Up the Stack
One command to launch Airflow, Postgres, Redis, and Celery workers
# Clone the Airflow + dbt pipeline project$ git clone https://github.com/aide/airflow-dbt-pipeline.git$ cd airflow-dbt-pipeline# Spin up Airflow, Postgres, Redis & Celery workers$ docker-compose -f docker-compose.prod.yml up -d# Verify: Airflow UI at localhost:8080$ open http://localhost:8080
Production Features
Backfill & Catchup
Master data reprocessing strategies. Handle late-arriving data, historical backfills, and idempotent re-runs.
Sensors & SLAs
Detect data arrival and enforce timing guarantees. File sensors, SQL sensors, and custom poke functions with SLA miss callbacks.
API Pagination & Retry
Reliably extract data from paginated REST APIs with cursor-based pagination, exponential backoff, and idempotent staging loads.
Prerequisites
- Python 3.11+ (functions, decorators, basic OOP)
- SQL basics (SELECT, JOIN, WHERE)
- Docker installed (8GB+ RAM)
- Git basics (commit, push, branches)
Related Learning Path
This project pairs with the Apache Airflow skill toolkit. Complete the modules first for maximum understanding, or dive straight in if you have prior Airflow experience.
View Airflow Skill ToolkitWhat is Apache Airflow?
/guide/what-is-airflow — complete reference guide
Ready to build?