Skip to content
ai-de.net/Projects/P03 · Commerce Data Warehouse Rebuild
FREE · full accessAnalytics trackP03

Build a
Kimball
data warehouse with dbt + Postgres

FreshCart's data is normalized for transactions, not analysis. Architect 22 dbt models — 4 conformed dimensions, atomic + event facts, SCD Type 2 history, incremental processing, and a Slim CI gate — against a 600K-order Postgres source.

Timeline
16 hours
Difficulty
Easy
Stack
dbt · Postgres · GitHub Actions

Dimensional modeling is still the most-asked analytics interview topic — this gives you a complete warehouse you can walk through, end-to-end.

By the end you will have
  • A 22-model dbt project running locally against Postgres
  • Four conformed dimensions plus SCD Type 2 history for customers
  • Atomic-grain order facts, two event facts, and a bridge table
  • Incremental events — measured 4 min → 30 sec speedup
  • A GitHub Actions Slim CI workflow gating every PR
  • dbt docs lineage and exposures.yml for downstream BI
PREREQSQL fundamentals (SELECT, JOIN, window functions) and basic command-line. dbt familiarity helps but isn’t required — the project teaches it as you build.
freshcart_warehouse · 22 models · scd2 armed
CI green
Postgres source
Staging
Marts (Kimball)
CI / serving
raw_customers
raw_orders
raw_order_items
raw_products
raw_stores
raw_web_events
6 sources · sources.yml
stg_orderstested · typed
stg_order_items
stg_customers
stg_products
stg_stores
stg_web_events
6 staging · views
dim_customersSCD2 ready
dim_products
dim_stores
dim_datesgenerated
fct_ordersatomic · 1 row / order
fct_order_itemsatomic · 1 row / line
fct_page_viewsincremental
fct_cart_eventsincremental
bridge_customer_products
4 dims · 4 facts · 1 bridge
customers_snapshotSCD2 · valid_from / to
GH Actions · Slim CI
200+ testsseverity: error
dbt docs · exposures
gated · documented
# SCD Type 2 · dbt snapshots
{% snapshot customers_snapshot %}
strategy='check', check_cols='all'
valid_from / valid_to · is_current
→ yesterday's customer state never lost
● Slim CI gate
dbt build
--select state:modified+ --defer
200+ assertions · severity: error
→ every PR gated, no surprises in main
22
dbt models
4
conformed dims
200+
test assertions
Curriculum · 4 modules · 16 hours

Every module is fully unlocked on the free plan.

Free-tier project — full path from a 600K-order Postgres source to a Slim-CI-gated Kimball warehouse with 22 models. No paywall, no card required.

P03 · 16 hours · 4 modules
Free preview FREE required
All 4 modules included — no card required. Pick up where you left off; progress is saved across browsers.
M01
Star Schema Design — the Kimball blueprint
Apply Kimball's 4-step methodology to define grain, then build 4 conformed dimensions: dim_customers, dim_products, dim_stores, and a generated dim_dates with fiscal logic. Wire sources.yml against the FreshCart Postgres source.
4h9 lessonsFREE PREVIEW
Start →
M02
Fact tables & SCD — capture the history
Build fct_orders and fct_order_items at atomic grain. Implement SCD Type 1 (dim_products overwrite) and SCD Type 2 (dim_customers via dbt snapshots) so yesterday's customer state is never lost.
4h11 lessonsFREE PREVIEW
Start →
M03
Events, incremental & metrics layer
Add fct_page_views and fct_cart_events. Convert full-refresh to incremental with is_incremental() — 4 min → 30 sec on the events fact. Add a bridge table and a dbt-native metrics layer (revenue / customer / derived).
4h13 lessonsFREE PREVIEW
Start →
M04
Testing, quality & production
Schema tests + 3 custom generic test macros + 3 singular tests + dbt-docs lineage with exposures.yml. Ship a GitHub Actions Slim CI workflow (dbt build --select state:modified+ --defer) that gates every PR.
4h10 lessonsFREE PREVIEW
Start →
All 4 modules included · No card required to start
Begin module 01 →
Backed by curriculum

Advanced Data Modeling & Architecture

9 modules·12 hours·Kimball method·conformed dims·SCD type 2·grain decisions·surrogate keys
Open curriculum

This curriculum walks the same Kimball + advanced-modeling ground at a deeper level — recommended companion reading while you build.

The build, in 3 phases

Three sprints. Three checkpoints. One shippable warehouse.

Each phase ends with a green dbt build and a tagged commit you can point to. No ambiguity about where you are.

01~8h
Foundation

Schema designed, dimensions built, atomic facts shipped. SCD Type 2 history captured for customers via dbt snapshots.

  • 4 conformed dims (customers, products, stores, dates)
  • fct_orders + fct_order_items at atomic grain
  • customers_snapshot.sql + dim_customers_scd2
02~4h
Modeling

Event facts, incremental processing, bridge table for many-to-many, and a dbt-native metrics layer.

  • fct_page_views + fct_cart_events (incremental)
  • bridge_customer_products
  • 3 metrics YAML files (revenue / customer / derived)
03~4h
Production

Test pyramid green. Slim CI gating every PR. dbt docs + exposures auto-publish.

  • 200+ test assertions across 22 models
  • .github/workflows/dbt_ci.yml (Slim CI)
  • dbt docs lineage + exposures.yml
Project setup · 5 minutes

One command. Local Postgres, dbt, and 6 seed CSVs.

Download the starter kit — a 31 KB scaffold with 6 seed CSVs (~1.1k rows), all model stubs, snapshot config, and a profiles template. Ready to run in under 5 minutes.

What lives in the starter kit

Everything you need to start writing real Kimball models against real-shaped data on day one — no warehouse signup, no cloud credentials.

  • seeds/ — customers, products, stores, orders, order_items, web_events (6 CSVs)
  • models/ — staging, marts, snapshots scaffolded
  • snapshots/ — customers_snapshot.sql ready for SCD2
  • macros/ + packages.yml — dbt_utils + dbt-expectations + custom test scaffolds
  • profiles.yml.example — Postgres-first profile, Snowflake notes inline
Download · Free starter kit

FreshCart Warehouse Starter

Pre-configured scaffolds, sample CSVs, and dbt project files. Skip the boilerplate, start on module 01.

31 KB · 16 model stubs · 6 seed CSVs · No card required
~/projects/freshcart-warehouse — zsh
1. Unzip and enter the project
$ unzip freshcart-warehouse-starter.zip
$ cd freshcart-starter/freshcart_warehouse
2. Install dbt-postgres
$ pip install dbt-postgres
3. Configure profiles + database
$ mkdir -p ~/.dbt && cp profiles.yml.example ~/.dbt/profiles.yml
$ createdb freshcart_test
4. Seed, run, test
$ dbt deps && dbt seed && dbt run && dbt test
50
customers
200
orders
506
order_items
30
products
300
web_events
What you'll learn

Three things you can put on your résumé.

01

Kimball dimensional modeling

4-step methodology, conformed dimensions, atomic vs aggregate grain, surrogate keys, and SCD Type 1 vs Type 2 — the same patterns analytics teams ship in production.

star schemagrain designSCD2surrogate keysconformed dims
02

Production dbt patterns

dbt snapshots for change data capture, incremental models with is_incremental(), schema tests, custom generic test macros, singular tests, and source freshness checks.

dbt snapshotsis_incremental()schema testssingular testsfreshness
03

Analytics engineering CI

GitHub Actions Slim CI with state:modified+ + --defer, dbt docs auto-generated lineage, and exposures.yml linking models to downstream BI consumers.

GitHub Actionsdbt build --deferdbt docsexposuresSlim CI
Production hardening

The same warehouse — but built like someone depends on it.

Most dbt tutorials stop at dbt run green on your laptop. This one ships you the patterns that change when finance opens the dashboard at 9am.

Tutorial versionModule 01–02
×
Idempotency
Re-runs rebuild from scratch every time
×
History
Today's customer state overwrites yesterday's
×
Test failures
Print to console, run keeps going
×
Data freshness
Run when you remember to
×
Deploys
Push to main, hope the build is green
×
Lineage
Stale README diagram, hand-maintained
Your production versionModule 03–04
Idempotency
Incremental + is_incremental() with state:modified+ --defer
History
dbt snapshot with valid_from / valid_to + SCD2 join macro
Test failures
severity: error on FK + uniqueness, blocks downstream
Data freshness
sources.yml freshness: SLA + warn/error thresholds
Deploys
GitHub Actions Slim CI: dbt build --select state:modified+
Lineage
dbt docs generate + exposures.yml auto-publish
Who this is for

Pick this if you’re shipping analytics, not learning to.

AE

Analytics engineers

You write SQL all day but want the production patterns: SCD2, incremental, contracts, Slim CI, and exposures.

DA

Data analysts leveling up

You're tired of one-off queries. You want to own the model behind the dashboard, not just the chart on top of it.

BE

Backend engineers crossing over

You know systems but the warehouse is opaque. This makes dbt feel like the codebase you already know — refs, tests, CI gates.

FN

Finance / ops crossing into data

You live in spreadsheets. You want one source of truth that doesn't break when somebody renames a column at 5pm.

FAQ · advanced data modeling

Quick answers.

A star schema organizes data into one fact table at the center (atomic events like orders) surrounded by dimension tables (descriptive context like customers, products, dates). It's the canonical analytical model for a reason: it's denormalized enough to make BI queries fast, but conformed enough that the same dim_customers row joins to every fact in the warehouse. This project ships fct_orders + fct_order_items at the center with dim_customers, dim_products, dim_stores, and dim_dates around them.
Grain is what one row in a fact table represents. fct_orders has order-level grain (one row per order); fct_order_items has line-item grain (one row per line). You declare grain before you write a single SELECT — get it wrong and every aggregation downstream is wrong (double-counting revenue, miscounting customers). Module 01 walks the 4-step Kimball method that puts grain first.
SCD Type 1 overwrites the row when an attribute changes — yesterday's value is gone (used here for dim_products, where you only care about the current price). SCD Type 2 adds a new row on change with valid_from / valid_to / is_current flags, preserving full history (used here for dim_customers via dbt snapshots, so you can answer 'what segment was this customer in when they placed this order?'). Type 2 is mandatory whenever a dimension drives revenue calculations or compliance reports.
A surrogate key is a hash or sequence the warehouse owns; a natural key is the source-system id (customer_id from Postgres). Two reasons to use surrogates everywhere: (1) SCD Type 2 needs them — when dim_customers gets a new row for the same customer, the natural key collides but surrogates stay unique; (2) source-system migrations don't break the warehouse — when the OLTP team renumbers customer_id, your fact joins survive. This project uses dbt_utils.generate_surrogate_key() throughout.
Use a bridge whenever a single fact row maps to multiple dim rows (a customer browsing many products in one session, an order with multiple coupons applied). Flattening into a comma-separated column or duplicating fact rows breaks aggregation: you'll either lose the relationship or double-count revenue. This project ships bridge_customer_products to model the many-to-many between sessions and products viewed.
Yes — the underlying compute changed (Iceberg, Delta, Snowflake) but the model didn't. BI tools, semantic layers, and AI agents all want star-schema-shaped data on top of whatever lives below. The patterns you learn here (grain, conformed dims, SCD2, surrogate keys, bridges) are exactly what shows up on top of Iceberg in P04 or Cube in P19. Master the model once; it ports across every warehouse you'll ever touch.

Ready to ship a real dbt warehouse?

Module 01 takes about 4 hours. By the end you'll have Postgres + dbt running locally with your first 4 dimensions modeled and tested.

P03 · Commerce Data Warehouse · FREE · no cardStart module 01 →
Press Cmd+K to open