ai-de.net/Projects/P19 · Ecommerce Analytics Modeling Layer

FREE · full accessAnalytics trackP19

Build the
modeling layer
every dashboard sits on.

ShopCo's CEO just got three different revenue numbers from three teams. Build the dbt project that fixes it: 17 tested models in 5 layers (sources → staging → intermediate → marts → snapshots), a fact_orders + dim_customers + dim_products star schema, incremental refreshes on the high-volume tables, an SCD Type 2 customer snapshot, and a Slim CI + dbt Cloud + Slack alerting deploy pipeline.

Timeline

8-10 hours

Difficulty

Easy

Stack

dbt · Snowflake · GitHub Actions · dbt Cloud

See what you'll build

The “walk me through how you’d model the analytics layer for an ecommerce platform” question — asked in nearly every analytics-engineer take-home and on-site round.

By the end you will have shipped

A 17-model dbt project (5 staging + 2 intermediate + 5 marts + 1 snapshot + sources.yml)
fact_orders + dim_customers + dim_products star schema at one-row-per-order grain
fct_customer_cohorts + fct_product_performance + fct_customer_ltv (RFM-scored)
Incremental materialization on fact_orders + stg_web_events with state-aware refresh
SCD Type 2 customers_snapshot with dbt_valid_from / dbt_valid_to
Two GitHub Actions workflows (Slim CI on PR + production deploy on merge) wired to dbt Cloud + Slack

PREREQComfortable with SQL (joins, aggregations, basic window functions), Git, and the terminal. We recommend the dbt & Analytics Engineering path first if dbt itself is new — it covers the primitives this project composes into a real production codebase.

shopco.analytics.* · 17 models tested

Slim CI green

Raw sources

Staging

Marts · star

Prod spine

raw_customers

raw_orders

raw_order_items

raw_products

raw_web_events

5 sources · sources.yml

stg_customers

stg_orderstested + typed + flagged

stg_order_items

stg_products

stg_web_events

5 staging · views

dim_customers

fact_ordersgrain: 1 row / order

dim_products

fct_customer_cohorts

fct_product_performance

star + trend marts

customers_snapshotSCD2 · valid_from / to

fct_customer_ltv · RFM

GH Actions · Slim CI

dbt Cloud · scheduled

slack webhook · alerts

deploy spine

# Star schema · documented grain

fact_orders @ 1 row / order · tested PK

dim_customers · dim_products · denormed

unique · not_null · accepted_values · refs

→ the layer every BI tool actually wants

● Slim CI · dbt Cloud · Slack

state:modified+ --defer · only changed

dbt Cloud schedule · dev / staging / prod

on-run-end hook → Slack webhook

→ dbt docs hosted on GitHub Pages

Tested models

dbt layers

Slim CI + SCD2

Production patterns

Curriculum · 4 parts · 8-10 hours

Every part is fully unlocked on the free plan.

Free-tier project — full path from raw ShopCo seeds to a Slim-CI-gated, dbt-Cloud-deployed analytics modeling layer. No paywall, no card required.

P19 · 8-10 hours · 4 parts

Free preview FREE required

All 4 parts included — no card required. Pick up where you left off; progress is saved across browsers.

M01

✓Foundation — clean the broken raw data

dbt init + profiles.yml + sources.yml. 5 staging models (stg_customers, stg_orders, stg_order_items, stg_products, stg_web_events) with cleaning (LOWER/TRIM/INITCAP/NULLIF, CASE flags). Schema tests: unique + not_null + accepted_values + relationships across all 5 models.

~2h8 lessonsFREE PREVIEW

Start →

M02

✓Star schema — fact + dimensions at the right grain

Two ephemeral intermediate models (orders + items, orders + customers). fact_orders at one-row-per-order grain. dim_customers with lifetime metrics + segmentation. dim_products with margin and tier. fct_customer_cohorts and fct_product_performance for trend marts.

~3h9 lessonsFREE PREVIEW

Start →

M03

✓Scale + history — incremental + SCD2 + RFM LTV

Convert fact_orders to incremental with unique_key='order_id'. Append-only incremental on stg_web_events. customers_snapshot via dbt snapshot (strategy='check', check_cols). fct_customer_ltv with RFM scoring + churn-risk buckets. Snowflake cluster_by + BigQuery partition_by examples.

~3h10 lessonsFREE PREVIEW

Start →

M04

✓Production — dbt Cloud + Slim CI + Slack + docs

dbt Cloud project with dev/staging/prod environments. Two GitHub Actions workflows: Slim CI on PR (state:modified+ --defer) and production deploy on merge. Slack webhook for alerts. dbt docs hosted on GitHub Pages. Branch protection + schema isolation (DEV_username / STAGING / PROD).

~2h9 lessonsFREE PREVIEW

Start →

All 4 parts included · No card required to start

Begin part 01 →

Backed by curriculum

dbt & Analytics Engineering

7 modules·9.75 hours·dbt models·tests·Jinja·incremental·snapshots

Open curriculum→

The dbt & Analytics Engineering path covers the primitives — this is the deeper dbt project that composes them into a real production codebase on a real ecommerce dataset.

The build, in 3 phases

Three sprints. Three checkpoints. One trustworthy modeling layer.

Each phase ends with a tagged commit, a passing dbt test suite, and an artifact a senior analytics engineer would actually accept.

01~2h

Clean the broken raw data

5 staging models clean and standardize the ShopCo raw tables. sources.yml documents lineage. 4 test types green across every model.

✓stg_customers / stg_orders / stg_order_items / stg_products / stg_web_events
✓sources.yml + per-model schema tests (unique, not_null, accepted_values, relationships)
✓dbt run + dbt test passing on the dev schema

02~3h

Stand up the star schema + analytics marts

Star schema at one-row-per-order grain. Two dimensions with lifetime + tier metrics. Three fact-style marts (cohorts, product performance, LTV with RFM scoring).

✓fact_orders + dim_customers + dim_products (documented grain + tested PKs)
✓fct_customer_cohorts + fct_product_performance for trend analysis
✓fct_customer_ltv with RFM-bucketed churn risk

03~5h

Make it production-ready

Incremental on the high-volume tables. SCD2 customer snapshot. dbt Cloud + Slim CI + production deploy + Slack + GitHub Pages docs. Branch protection + schema isolation across dev/staging/prod.

✓Incremental on fact_orders + stg_web_events with state-aware refresh
✓customers_snapshot (SCD2) + Snowflake cluster_by / BigQuery partition_by examples
✓Slim CI + production deploy GH Actions + Slack webhook + dbt docs hosted

Project setup · 5 minutes

One starter kit. 17 models, 5 seeds, dbt Cloud-ready.

The starter kit ships a complete dbt project skeleton with 5 seeded ShopCo CSVs (~10K rows total), profiles + sources + a sample GitHub Actions workflow — so you can dbt seed → dbt run → dbt test on Day One without hand-typing schemas.

What lives in the repo

Everything you need to run all 4 parts on Snowflake (or any dbt-supported warehouse), plus the ShopCo seed CSVs that simulate a multi-category retailer at the row counts this project assumes.

models/ — 17 dbt models across staging / intermediate / marts (with .yml schemas)
seeds/ — 5 ShopCo CSVs (customers, orders, order_items, products, web_events) at ~10K rows total
snapshots/ — customers_snapshot SCD Type 2 spec
.github/workflows/ — Slim CI on PR + production deploy on merge (state:modified+ --defer)
dbt_project.yml — materialization defaults + path config + per-layer schemas
profiles.yml.example — Snowflake/BigQuery/Postgres connection templates

Download · Starter Kit

ShopCo Analytics Starter Kit

Pre-built dbt project with 17 models, 5 ShopCo seed CSVs, GitHub Actions Slim CI workflow, and profiles templates. dbt seed → dbt run → dbt test in under 5 minutes. Skip the boilerplate, start on Part 01.

~219 KB · 17 dbt models · 5 seed CSVs (10K rows) · GH Actions CI · PRO required

~/projects/ecommerce-metrics-layer — zsh

1. Unzip and install dbt + the Snowflake adapter

$ unzip shopco-analytics.zip

$ cd shopco-analytics

$ python3 -m venv .venv && source .venv/bin/activate

$ pip install dbt-core dbt-snowflake

2. Configure your dbt profile (Snowflake / BigQuery / Postgres)

$ cp profiles.yml.example ~/.dbt/profiles.yml

$ # edit account / user / password / warehouse / database / schema

3. Seed the ShopCo dataset (~10K rows total)

$ dbt deps

$ dbt seed

4. Run the staging layer + run all tests

$ dbt run --select staging.*

$ dbt test

5. Validate sources are fresh

$ dbt source freshness

500

customers

orders

order_items

web_events

What you'll learn

Three things you can put on your résumé.

Star schema with documented grain

fact_orders at one-row-per-order, conformed dim_customers + dim_products with tested PKs, and trend marts (cohorts, product performance) layered on top — the model every BI tool actually wants to query from.

star schemagrain decisionsconformed dimstested PKstrend marts

Incremental + SCD2 + RFM scoring

Convert fact_orders to incremental with unique_key + state-aware refresh. Snapshot dim_customers via dbt snapshot (strategy='check'). Score LTV with RFM buckets — the patterns that turn a tutorial repo into a production warehouse.

is_incremental()dbt snapshotsSCD2RFM scoringunique_key

dbt Cloud + Slim CI + production deploy

Two GitHub Actions workflows (Slim CI on PR with state:modified+ --defer; production deploy on merge). dbt Cloud schedule, Slack webhook alerts, dbt docs hosted on GitHub Pages, branch protection + dev/staging/prod schema isolation.

GitHub Actionsstate:modified+--deferdbt CloudSlack alerts

What changes vs a tutorial dbt project

The same models — but built for the real warehouse.

Tutorials run a full rebuild on every commit, against a single schema, with manual Slack pings when something breaks. Production looks different — incremental keys, deferred state, snapshot strategies, and CI that only touches what actually changed. Here’s the diff, with the dbt primitives you reach for.

Tutorial versionWhat you ship in 8-10 hrs

Refresh strategy

dbt run rebuilds every model

Customer history

Latest customer state only

PR CI

Run all tests on every PR

Schema isolation

Single dev schema for everyone

Source freshness

Hope the upstream pipeline ran

Failure routing

Watch the GH Actions tab

Production versionParts 03-04 + ops

✓

Refresh strategy

materialized='incremental' with unique_key='order_id' and is_incremental() guard

✓

Customer history

dbt snapshot with strategy='check' + check_cols emitting dbt_valid_from/dbt_valid_to

✓

PR CI

dbt run --select state:modified+ --defer --state ../ — only changed models, deferred to prod state

✓

Schema isolation

generate_schema_name macro routes to DEV_username / STAGING / PROD

✓

Source freshness

dbt source freshness with warn_after / error_after per source — fail CI if upstream is stale

✓

Failure routing

on-run-end hook + Slack webhook + dbt Cloud notification channels per environment

Who this is for

Pick this if you want the first dbt project on your résumé to be a real one.

Analytics engineers

You've used dbt at work but only on someone else's repo. This is the project that lets you defend grain decisions, materialization choices, and CI patterns from first principles.

Junior data engineers

You can write SQL but the analytics-engineer ladder is unfamiliar. This is the cleanest first dbt project — staging → marts → snapshots → Slim CI, on a domain everyone understands.

Career changers / first dbt project

You finished a SQL course and the next step is fuzzy. dbt + ecommerce + GH Actions is the portfolio piece recruiters actually open and read.

Interview prep

Take-homes increasingly ship a CSV and ask for a dbt project. After this you can produce one in an afternoon and defend the modeling decisions in the followup call.

Related curriculum

Going deeper? Two tracks back this project.

The dbt curriculum is the spine (linked above). These two let you go deeper on the layers next to it — SQL fluency this project assumes, and the advanced data-modeling theory behind the star schema + SCD2 patterns.

FAQ · advanced data modeling

Quick answers.

Is this really fully free?+

Yes. P19 is one of our free-tier projects — every part, the full repo scaffold, all 5 ShopCo seed CSVs (~10K rows total), and the certificate of completion. No card on file required. The full 8-10 hours of content is unlocked from day one.

What's the difference between a modeling layer and a semantic layer?+

The modeling layer is the dbt foundation — staged sources, fact tables at documented grain, conformed dimensions, SCD snapshots, tested marts. The semantic layer (Cube, MetricFlow, Lightdash, LookML) sits on top of it and exposes business metrics to BI tools. This project ships the modeling layer end-to-end. Semantic-layer tooling is intentionally out of scope; the clean output of this project is exactly what those tools want as their input.

What is grain and why does this project ship at one-row-per-order?+

Grain is what one row in a fact table represents. fact_orders has order-level grain — every additive measure (revenue, items_count, discount_total) sums correctly without double-counting. Going coarser (one row per customer) loses time-series granularity; going finer (one row per line-item, like fct_order_items) makes basket-level questions awkward. The right grain is the most atomic level the business actually asks questions at — and for ecommerce, that's the order.

What is RFM scoring and why is it the right LTV bucketing?+

RFM = Recency / Frequency / Monetary. Each customer is bucketed (1–5) on three axes: how recently they purchased, how often, and how much. The combined score (e.g. R5F4M5 = 'big spender, frequent, here last week') drives churn-risk segmentation and LTV cohort analysis. RFM beats raw LTV averages because it captures *behavioral* signal, not just dollar totals — a customer who spent $5K once 18 months ago looks very different from one who spends $300/quarter for 5 quarters running.

When should you go incremental vs full-refresh in dbt?+

Default to full-refresh until the table is genuinely slow on full rebuilds (think: > 30s in dev, > 5 min in prod). Switch to incremental when (a) the source is append-mostly with a reliable timestamp, (b) the table is large enough that full refreshes hurt, and (c) you can define a unique_key for de-dup on late-arriving rows. fact_orders here goes incremental in part 03 because it satisfies all three; staging models stay full-refresh because they're cheap and re-typing logic shouldn't drift between runs.

How is this different from /projects/ecommerce-data-warehouse (P03)?+

Both are free analytics-track dbt projects. P03 is the broader Kimball foundation — 22 models, 4 conformed dimensions, atomic + event facts, a Slim CI gate on Postgres. P19 is the deeper analytics-engineer specialist — fewer models (17) but more production patterns: deferred-state Slim CI, dbt Cloud schedule, RFM-scored LTV, SCD2 snapshots, dev/staging/prod schema isolation, and Slack-alerted production deploys. P03 = your first warehouse end-to-end; P19 = the layer a senior analytics engineer actually owns.

Ready to build the layer everyone queries from?

Part 01 takes about 2 hours. By the end you'll have dbt + the ShopCo seeds running, 5 staging models clean, and 4 test types green across every model.

Browse 32 projects

P19 · Ecommerce Analytics Modeling Layer · FREE · no cardStart part 01 →

Build themodeling layerevery dashboard sits on.

Every part is fully unlocked on the free plan.

dbt & Analytics Engineering

Three sprints. Three checkpoints. One trustworthy modeling layer.

One starter kit. 17 models, 5 seeds, dbt Cloud-ready.

What lives in the repo

ShopCo Analytics Starter Kit

Three things you can put on your résumé.

Star schema with documented grain

Incremental + SCD2 + RFM scoring

dbt Cloud + Slim CI + production deploy

The same models — but built for the real warehouse.

Pick this if you want the first dbt project on your résumé to be a real one.

Analytics engineers

Junior data engineers

Career changers / first dbt project

Interview prep

Going deeper? Two tracks back this project.

SQL Mastery for Data Engineers

Advanced Data Modeling & Architecture

Quick answers.

Ready to build the layer everyone queries from?

Build the
modeling layer
every dashboard sits on.