ai-de.net/Projects/P03 · Commerce Data Warehouse Rebuild

Last updated 2026-05-22By AI-DE Engineering Team

FREE · full accessAnalytics trackP03

Build a
Kimball
data warehouse with dbt + Postgres

FreshCart's data is normalized for transactions, not analysis. Architect 22 dbt models — 4 conformed dimensions, atomic + event facts, SCD Type 2 history, incremental processing, and a Slim CI gate — against a 600K-order Postgres source.

Timeline

16 hours

Difficulty

Easy

Stack

dbt · Postgres · GitHub Actions

See what you'll build

Dimensional modeling is still the most-asked analytics interview topic — this gives you a complete warehouse you can walk through, end-to-end.

By the end you will have

A 22-model dbt project running locally against Postgres
Four conformed dimensions plus SCD Type 2 history for customers
Atomic-grain order facts, two event facts, and a bridge table
Incremental events — measured 4 min → 30 sec speedup
A GitHub Actions Slim CI workflow gating every PR
dbt docs lineage and exposures.yml for downstream BI

PREREQSQL fundamentals (SELECT, JOIN, window functions) and basic command-line. dbt familiarity helps but isn’t required — the project teaches it as you build.

freshcart_warehouse · 22 models · scd2 armed

CI green

Postgres source

Staging

Marts (Kimball)

CI / serving

raw_customers

raw_orders

raw_order_items

raw_products

raw_stores

raw_web_events

6 sources · sources.yml

stg_orderstested · typed

stg_order_items

stg_customers

stg_products

stg_stores

stg_web_events

6 staging · views

dim_customersSCD2 ready

dim_products

dim_stores

dim_datesgenerated

fct_ordersatomic · 1 row / order

fct_order_itemsatomic · 1 row / line

fct_page_viewsincremental

fct_cart_eventsincremental

bridge_customer_products

4 dims · 4 facts · 1 bridge

customers_snapshotSCD2 · valid_from / to

GH Actions · Slim CI

200+ testsseverity: error

dbt docs · exposures

gated · documented

# SCD Type 2 · dbt snapshots

{% snapshot customers_snapshot %}

strategy='check', check_cols='all'

valid_from / valid_to · is_current

→ yesterday's customer state never lost

● Slim CI gate

dbt build

--select state:modified+ --defer

200+ assertions · severity: error

→ every PR gated, no surprises in main

dbt models

conformed dims

200+

test assertions

Curriculum · 4 modules · 16 hours

Every module is fully unlocked on the free plan.

Free-tier project — full path from a 600K-order Postgres source to a Slim-CI-gated Kimball warehouse with 22 models. No paywall, no card required.

P03 · 16 hours · 4 modules

Free preview FREE required

All 4 modules included — no card required. Pick up where you left off; progress is saved across browsers.

M01

✓Star Schema Design — the Kimball blueprint

Apply Kimball's 4-step methodology to define grain, then build 4 conformed dimensions: dim_customers, dim_products, dim_stores, and a generated dim_dates with fiscal logic. Wire sources.yml against the FreshCart Postgres source.

4h9 lessonsFREE PREVIEW

Start →

M02

✓Fact tables & SCD — capture the history

Build fct_orders and fct_order_items at atomic grain. Implement SCD Type 1 (dim_products overwrite) and SCD Type 2 (dim_customers via dbt snapshots) so yesterday's customer state is never lost.

4h11 lessonsFREE PREVIEW

Start →

M03

✓Events, incremental & metrics layer

Add fct_page_views and fct_cart_events. Convert full-refresh to incremental with is_incremental() — 4 min → 30 sec on the events fact. Add a bridge table and a dbt-native metrics layer (revenue / customer / derived).

4h13 lessonsFREE PREVIEW

Start →

M04

✓Testing, quality & production

Schema tests + 3 custom generic test macros + 3 singular tests + dbt-docs lineage with exposures.yml. Ship a GitHub Actions Slim CI workflow (dbt build --select state:modified+ --defer) that gates every PR.

4h10 lessonsFREE PREVIEW

Start →

All 4 modules included · No card required to start

Begin module 01 →

Backed by curriculum

Advanced Data Modeling & Architecture

9 modules·12 hours·Kimball method·conformed dims·SCD type 2·grain decisions·surrogate keys

Open curriculum→

This curriculum walks the same Kimball + advanced-modeling ground at a deeper level — recommended companion reading while you build.

The build, in 3 phases

Three sprints. Three checkpoints. One shippable warehouse.

Each phase ends with a green dbt build and a tagged commit you can point to. No ambiguity about where you are.

01~8h

Foundation

Schema designed, dimensions built, atomic facts shipped. SCD Type 2 history captured for customers via dbt snapshots.

✓4 conformed dims (customers, products, stores, dates)
✓fct_orders + fct_order_items at atomic grain
✓customers_snapshot.sql + dim_customers_scd2

02~4h

Modeling

Event facts, incremental processing, bridge table for many-to-many, and a dbt-native metrics layer.

✓fct_page_views + fct_cart_events (incremental)
✓bridge_customer_products
✓3 metrics YAML files (revenue / customer / derived)

03~4h

Production

Test pyramid green. Slim CI gating every PR. dbt docs + exposures auto-publish.

✓200+ test assertions across 22 models
✓.github/workflows/dbt_ci.yml (Slim CI)
✓dbt docs lineage + exposures.yml

Project setup · 5 minutes

One command. Local Postgres, dbt, and 6 seed CSVs.

Download the starter kit — a 31 KB scaffold with 6 seed CSVs (~1.1k rows), all model stubs, snapshot config, and a profiles template. Ready to run in under 5 minutes.

What lives in the starter kit

Everything you need to start writing real Kimball models against real-shaped data on day one — no warehouse signup, no cloud credentials.

seeds/ — customers, products, stores, orders, order_items, web_events (6 CSVs)
models/ — staging, marts, snapshots scaffolded
snapshots/ — customers_snapshot.sql ready for SCD2
macros/ + packages.yml — dbt_utils + dbt-expectations + custom test scaffolds
profiles.yml.example — Postgres-first profile, Snowflake notes inline

Download · Free starter kit

FreshCart Warehouse Starter

Pre-configured scaffolds, sample CSVs, and dbt project files. Skip the boilerplate, start on module 01.

31 KB · 16 model stubs · 6 seed CSVs · No card required

~/projects/freshcart-warehouse — zsh

1. Unzip and enter the project

$ unzip freshcart-warehouse-starter.zip

$ cd freshcart-starter/freshcart_warehouse

2. Install dbt-postgres

$ pip install dbt-postgres

3. Configure profiles + database

$ mkdir -p ~/.dbt && cp profiles.yml.example ~/.dbt/profiles.yml

$ createdb freshcart_test

4. Seed, run, test

$ dbt deps && dbt seed && dbt run && dbt test

customers

200

orders

506

order_items

products

300

web_events

What you'll learn

Three things you can put on your résumé.

Kimball dimensional modeling

4-step methodology, conformed dimensions, atomic vs aggregate grain, surrogate keys, and SCD Type 1 vs Type 2 — the same patterns analytics teams ship in production.

star schemagrain designSCD2surrogate keysconformed dims

Production dbt patterns

dbt snapshots for change data capture, incremental models with is_incremental(), schema tests, custom generic test macros, singular tests, and source freshness checks.

dbt snapshotsis_incremental()schema testssingular testsfreshness

Analytics engineering CI

GitHub Actions Slim CI with state:modified+ + --defer, dbt docs auto-generated lineage, and exposures.yml linking models to downstream BI consumers.

GitHub Actionsdbt build --deferdbt docsexposuresSlim CI

Production hardening

The same warehouse — but built like someone depends on it.

Most dbt tutorials stop at dbt run green on your laptop. This one ships you the patterns that change when finance opens the dashboard at 9am.

Tutorial versionModule 01–02

Idempotency

Re-runs rebuild from scratch every time

History

Today's customer state overwrites yesterday's

Test failures

Print to console, run keeps going

Data freshness

Run when you remember to

Deploys

Push to main, hope the build is green

Lineage

Stale README diagram, hand-maintained

Your production versionModule 03–04

✓

Idempotency

Incremental + is_incremental() with state:modified+ --defer

✓

History

dbt snapshot with valid_from / valid_to + SCD2 join macro

✓

Test failures

severity: error on FK + uniqueness, blocks downstream

✓

Data freshness

sources.yml freshness: SLA + warn/error thresholds

✓

Deploys

GitHub Actions Slim CI: dbt build --select state:modified+

✓

Lineage

dbt docs generate + exposures.yml auto-publish

Who this is for

Pick this if you’re shipping analytics, not learning to.

Analytics engineers

You write SQL all day but want the production patterns: SCD2, incremental, contracts, Slim CI, and exposures.

Data analysts leveling up

You're tired of one-off queries. You want to own the model behind the dashboard, not just the chart on top of it.

Backend engineers crossing over

You know systems but the warehouse is opaque. This makes dbt feel like the codebase you already know — refs, tests, CI gates.

Finance / ops crossing into data

You live in spreadsheets. You want one source of truth that doesn't break when somebody renames a column at 5pm.

Related curriculum

Going deeper? Three tracks back this project.

Advanced data modeling is the spine — the two paths next to it let you sharpen the SQL underneath the schema and the observability layer that protects it.

FAQ · advanced data modeling

Quick answers.

What is a star schema and why does this project use one?+

A star schema organizes data into one fact table at the center (atomic events like orders) surrounded by dimension tables (descriptive context like customers, products, dates). It's the canonical analytical model for a reason: it's denormalized enough to make BI queries fast, but conformed enough that the same dim_customers row joins to every fact in the warehouse. This project ships fct_orders + fct_order_items at the center with dim_customers, dim_products, dim_stores, and dim_dates around them.

What is grain and why is it the first decision in dimensional modeling?+

Grain is what one row in a fact table represents. fct_orders has order-level grain (one row per order); fct_order_items has line-item grain (one row per line). You declare grain before you write a single SELECT — get it wrong and every aggregation downstream is wrong (double-counting revenue, miscounting customers). Module 01 walks the 4-step Kimball method that puts grain first.

What's the difference between SCD Type 1 and SCD Type 2?+

SCD Type 1 overwrites the row when an attribute changes — yesterday's value is gone (used here for dim_products, where you only care about the current price). SCD Type 2 adds a new row on change with valid_from / valid_to / is_current flags, preserving full history (used here for dim_customers via dbt snapshots, so you can answer 'what segment was this customer in when they placed this order?'). Type 2 is mandatory whenever a dimension drives revenue calculations or compliance reports.

Why surrogate keys instead of natural keys?+

A surrogate key is a hash or sequence the warehouse owns; a natural key is the source-system id (customer_id from Postgres). Two reasons to use surrogates everywhere: (1) SCD Type 2 needs them — when dim_customers gets a new row for the same customer, the natural key collides but surrogates stay unique; (2) source-system migrations don't break the warehouse — when the OLTP team renumbers customer_id, your fact joins survive. This project uses dbt_utils.generate_surrogate_key() throughout.

When do you need a bridge table vs flattening many-to-many?+

Use a bridge whenever a single fact row maps to multiple dim rows (a customer browsing many products in one session, an order with multiple coupons applied). Flattening into a comma-separated column or duplicating fact rows breaks aggregation: you'll either lose the relationship or double-count revenue. This project ships bridge_customer_products to model the many-to-many between sessions and products viewed.

Is dimensional modeling still relevant in the lakehouse era?+

Yes — the underlying compute changed (Iceberg, Delta, Snowflake) but the model didn't. BI tools, semantic layers, and AI agents all want star-schema-shaped data on top of whatever lives below. The patterns you learn here (grain, conformed dims, SCD2, surrogate keys, bridges) are exactly what shows up on top of Iceberg in P04 or Cube in P19. Master the model once; it ports across every warehouse you'll ever touch.

Related projects

Paired with this project

P19·FREE·foundations

Ecommerce analytics modeling layer

Free 17-model dbt analytics modeling layer for ShopCo — star schema with documented grain, incremental + SCD2, RFM-scored LTV, and a dbt Cloud + Slim CI production spine.

Explore project →

Ready to ship a real dbt warehouse?

Module 01 takes about 4 hours. By the end you'll have Postgres + dbt running locally with your first 4 dimensions modeled and tested.

Browse 32 projects

P03 · Commerce Data Warehouse · FREE · no cardStart module 01 →

Build aKimballdata warehouse with dbt + Postgres

Every module is fully unlocked on the free plan.

Advanced Data Modeling & Architecture

Three sprints. Three checkpoints. One shippable warehouse.

One command. Local Postgres, dbt, and 6 seed CSVs.

What lives in the starter kit

FreshCart Warehouse Starter

Three things you can put on your résumé.

Kimball dimensional modeling

Production dbt patterns

Analytics engineering CI

The same warehouse — but built like someone depends on it.

Pick this if you’re shipping analytics, not learning to.

Analytics engineers

Data analysts leveling up

Backend engineers crossing over

Finance / ops crossing into data

Going deeper? Three tracks back this project.

dbt & Analytics Engineering

SQL Mastery for Data Engineers

Data Observability & Quality

Quick answers.

Paired with this project

Ready to ship a real dbt warehouse?

Build a
Kimball
data warehouse with dbt + Postgres