Skip to content
Back to Projects

Build a Production Data Pipeline with Airflow + dbt

You joined a growing company that pulls data from external APIs every day. The current process is manual, unreliable, and dashboards are often late or wrong. Your job: automate ingestion, orchestrate transformations, add monitoring, and ship a production-ready pipeline.

6-8 hrsIntermediate4 Parts
Apache AirflowPythonDockerPostgreSQLdbtPrometheusGrafanaREST APIs
Airflow Grid View
3/3 DAGs active
ingest_orders_api
customer_orders_pipeline
dbt_medallion_pipeline

Fig 1.1: Airflow + dbt Pipeline — 3 DAGs, all green

What You'll Build

3 Production DAGs

API ingestion with pagination, multi-step orchestration, and dbt transformation — mission-driven, not a demo.

Medallion Architecture

Orchestrate dbt bronze/silver/gold transformations with incremental processing and data-aware scheduling.

Production Monitoring

SLA checks, freshness monitoring, Slack alerting, and on_failure_callback patterns.

Portfolio-Ready Runbook

Ship with an architecture diagram, DAG screenshots, system design summary, and deployment runbook.

Progressive Build Path

Ingest → Orchestrate → Transform → Ship. Each part builds on the last. Complete in ~6–8 hours.

Total: ~6–8 hours across 4 parts

Spin Up the Stack

One command to launch Airflow, Postgres, Redis, and Celery workers

terminal
# Clone the Airflow + dbt pipeline project
$ git clone https://github.com/aide/airflow-dbt-pipeline.git
$ cd airflow-dbt-pipeline

# Spin up Airflow, Postgres, Redis & Celery workers
$ docker-compose -f docker-compose.prod.yml up -d

# Verify: Airflow UI at localhost:8080
$ open http://localhost:8080

Production Features

Backfill & Catchup

Master data reprocessing strategies. Handle late-arriving data, historical backfills, and idempotent re-runs.

Sensors & SLAs

Detect data arrival and enforce timing guarantees. File sensors, SQL sensors, and custom poke functions with SLA miss callbacks.

API Pagination & Retry

Reliably extract data from paginated REST APIs with cursor-based pagination, exponential backoff, and idempotent staging loads.

Prerequisites

  • Python 3.11+ (functions, decorators, basic OOP)
  • SQL basics (SELECT, JOIN, WHERE)
  • Docker installed (8GB+ RAM)
  • Git basics (commit, push, branches)

Related Learning Path

This project pairs with the Apache Airflow skill toolkit. Complete the modules first for maximum understanding, or dive straight in if you have prior Airflow experience.

View Airflow Skill Toolkit

What is Apache Airflow?

/guide/what-is-airflow — complete reference guide

Ready to build?

Press Cmd+K to open