Skip to content
ai-de.net/Projects/P19 · Ecommerce Analytics Modeling Layer
FREE · full accessAnalytics trackP19

Build the
modeling layer
every dashboard sits on.

ShopCo's CEO just got three different revenue numbers from three teams. Build the dbt project that fixes it: 17 tested models in 5 layers (sources → staging → intermediate → marts → snapshots), a fact_orders + dim_customers + dim_products star schema, incremental refreshes on the high-volume tables, an SCD Type 2 customer snapshot, and a Slim CI + dbt Cloud + Slack alerting deploy pipeline.

Timeline
8-10 hours
Difficulty
Easy
Stack
dbt · Snowflake · GitHub Actions · dbt Cloud

The “walk me through how you’d model the analytics layer for an ecommerce platform” question — asked in nearly every analytics-engineer take-home and on-site round.

By the end you will have shipped
  • A 17-model dbt project (5 staging + 2 intermediate + 5 marts + 1 snapshot + sources.yml)
  • fact_orders + dim_customers + dim_products star schema at one-row-per-order grain
  • fct_customer_cohorts + fct_product_performance + fct_customer_ltv (RFM-scored)
  • Incremental materialization on fact_orders + stg_web_events with state-aware refresh
  • SCD Type 2 customers_snapshot with dbt_valid_from / dbt_valid_to
  • Two GitHub Actions workflows (Slim CI on PR + production deploy on merge) wired to dbt Cloud + Slack
PREREQComfortable with SQL (joins, aggregations, basic window functions), Git, and the terminal. We recommend the dbt & Analytics Engineering path first if dbt itself is new — it covers the primitives this project composes into a real production codebase.
shopco.analytics.* · 17 models tested
Slim CI green
Raw sources
Staging
Marts · star
Prod spine
raw_customers
raw_orders
raw_order_items
raw_products
raw_web_events
5 sources · sources.yml
stg_customers
stg_orderstested + typed + flagged
stg_order_items
stg_products
stg_web_events
5 staging · views
dim_customers
fact_ordersgrain: 1 row / order
dim_products
fct_customer_cohorts
fct_product_performance
star + trend marts
customers_snapshotSCD2 · valid_from / to
fct_customer_ltv · RFM
GH Actions · Slim CI
dbt Cloud · scheduled
slack webhook · alerts
deploy spine
# Star schema · documented grain
fact_orders @ 1 row / order · tested PK
dim_customers · dim_products · denormed
unique · not_null · accepted_values · refs
→ the layer every BI tool actually wants
● Slim CI · dbt Cloud · Slack
state:modified+ --defer · only changed
dbt Cloud schedule · dev / staging / prod
on-run-end hook → Slack webhook
→ dbt docs hosted on GitHub Pages
17
Tested models
5
dbt layers
Slim CI + SCD2
Production patterns
Curriculum · 4 parts · 8-10 hours

Every part is fully unlocked on the free plan.

Free-tier project — full path from raw ShopCo seeds to a Slim-CI-gated, dbt-Cloud-deployed analytics modeling layer. No paywall, no card required.

P19 · 8-10 hours · 4 parts
Free preview FREE required
All 4 parts included — no card required. Pick up where you left off; progress is saved across browsers.
M01
Foundation — clean the broken raw data
dbt init + profiles.yml + sources.yml. 5 staging models (stg_customers, stg_orders, stg_order_items, stg_products, stg_web_events) with cleaning (LOWER/TRIM/INITCAP/NULLIF, CASE flags). Schema tests: unique + not_null + accepted_values + relationships across all 5 models.
~2h8 lessonsFREE PREVIEW
Start →
M02
Star schema — fact + dimensions at the right grain
Two ephemeral intermediate models (orders + items, orders + customers). fact_orders at one-row-per-order grain. dim_customers with lifetime metrics + segmentation. dim_products with margin and tier. fct_customer_cohorts and fct_product_performance for trend marts.
~3h9 lessonsFREE PREVIEW
Start →
M03
Scale + history — incremental + SCD2 + RFM LTV
Convert fact_orders to incremental with unique_key='order_id'. Append-only incremental on stg_web_events. customers_snapshot via dbt snapshot (strategy='check', check_cols). fct_customer_ltv with RFM scoring + churn-risk buckets. Snowflake cluster_by + BigQuery partition_by examples.
~3h10 lessonsFREE PREVIEW
Start →
M04
Production — dbt Cloud + Slim CI + Slack + docs
dbt Cloud project with dev/staging/prod environments. Two GitHub Actions workflows: Slim CI on PR (state:modified+ --defer) and production deploy on merge. Slack webhook for alerts. dbt docs hosted on GitHub Pages. Branch protection + schema isolation (DEV_username / STAGING / PROD).
~2h9 lessonsFREE PREVIEW
Start →
All 4 parts included · No card required to start
Begin part 01 →
Backed by curriculum

dbt & Analytics Engineering

7 modules·9.75 hours·dbt models·tests·Jinja·incremental·snapshots
Open curriculum

The dbt & Analytics Engineering path covers the primitives — this is the deeper dbt project that composes them into a real production codebase on a real ecommerce dataset.

The build, in 3 phases

Three sprints. Three checkpoints. One trustworthy modeling layer.

Each phase ends with a tagged commit, a passing dbt test suite, and an artifact a senior analytics engineer would actually accept.

01~2h
Clean the broken raw data

5 staging models clean and standardize the ShopCo raw tables. sources.yml documents lineage. 4 test types green across every model.

  • stg_customers / stg_orders / stg_order_items / stg_products / stg_web_events
  • sources.yml + per-model schema tests (unique, not_null, accepted_values, relationships)
  • dbt run + dbt test passing on the dev schema
02~3h
Stand up the star schema + analytics marts

Star schema at one-row-per-order grain. Two dimensions with lifetime + tier metrics. Three fact-style marts (cohorts, product performance, LTV with RFM scoring).

  • fact_orders + dim_customers + dim_products (documented grain + tested PKs)
  • fct_customer_cohorts + fct_product_performance for trend analysis
  • fct_customer_ltv with RFM-bucketed churn risk
03~5h
Make it production-ready

Incremental on the high-volume tables. SCD2 customer snapshot. dbt Cloud + Slim CI + production deploy + Slack + GitHub Pages docs. Branch protection + schema isolation across dev/staging/prod.

  • Incremental on fact_orders + stg_web_events with state-aware refresh
  • customers_snapshot (SCD2) + Snowflake cluster_by / BigQuery partition_by examples
  • Slim CI + production deploy GH Actions + Slack webhook + dbt docs hosted
Project setup · 5 minutes

One starter kit. 17 models, 5 seeds, dbt Cloud-ready.

The starter kit ships a complete dbt project skeleton with 5 seeded ShopCo CSVs (~10K rows total), profiles + sources + a sample GitHub Actions workflow — so you can dbt seed → dbt run → dbt test on Day One without hand-typing schemas.

What lives in the repo

Everything you need to run all 4 parts on Snowflake (or any dbt-supported warehouse), plus the ShopCo seed CSVs that simulate a multi-category retailer at the row counts this project assumes.

  • models/ — 17 dbt models across staging / intermediate / marts (with .yml schemas)
  • seeds/ — 5 ShopCo CSVs (customers, orders, order_items, products, web_events) at ~10K rows total
  • snapshots/ — customers_snapshot SCD Type 2 spec
  • .github/workflows/ — Slim CI on PR + production deploy on merge (state:modified+ --defer)
  • dbt_project.yml — materialization defaults + path config + per-layer schemas
  • profiles.yml.example — Snowflake/BigQuery/Postgres connection templates
Download · Starter Kit

ShopCo Analytics Starter Kit

Pre-built dbt project with 17 models, 5 ShopCo seed CSVs, GitHub Actions Slim CI workflow, and profiles templates. dbt seed → dbt run → dbt test in under 5 minutes. Skip the boilerplate, start on Part 01.

~219 KB · 17 dbt models · 5 seed CSVs (10K rows) · GH Actions CI · PRO required
~/projects/ecommerce-metrics-layer — zsh
1. Unzip and install dbt + the Snowflake adapter
$ unzip shopco-analytics.zip
$ cd shopco-analytics
$ python3 -m venv .venv && source .venv/bin/activate
$ pip install dbt-core dbt-snowflake
2. Configure your dbt profile (Snowflake / BigQuery / Postgres)
$ cp profiles.yml.example ~/.dbt/profiles.yml
$ # edit account / user / password / warehouse / database / schema
3. Seed the ShopCo dataset (~10K rows total)
$ dbt deps
$ dbt seed
4. Run the staging layer + run all tests
$ dbt run --select staging.*
$ dbt test
5. Validate sources are fresh
$ dbt source freshness
500
customers
2K
orders
5K
order_items
3K
web_events
What you'll learn

Three things you can put on your résumé.

01

Star schema with documented grain

fact_orders at one-row-per-order, conformed dim_customers + dim_products with tested PKs, and trend marts (cohorts, product performance) layered on top — the model every BI tool actually wants to query from.

star schemagrain decisionsconformed dimstested PKstrend marts
02

Incremental + SCD2 + RFM scoring

Convert fact_orders to incremental with unique_key + state-aware refresh. Snapshot dim_customers via dbt snapshot (strategy='check'). Score LTV with RFM buckets — the patterns that turn a tutorial repo into a production warehouse.

is_incremental()dbt snapshotsSCD2RFM scoringunique_key
03

dbt Cloud + Slim CI + production deploy

Two GitHub Actions workflows (Slim CI on PR with state:modified+ --defer; production deploy on merge). dbt Cloud schedule, Slack webhook alerts, dbt docs hosted on GitHub Pages, branch protection + dev/staging/prod schema isolation.

GitHub Actionsstate:modified+--deferdbt CloudSlack alerts
What changes vs a tutorial dbt project

The same models — but built for the real warehouse.

Tutorials run a full rebuild on every commit, against a single schema, with manual Slack pings when something breaks. Production looks different — incremental keys, deferred state, snapshot strategies, and CI that only touches what actually changed. Here’s the diff, with the dbt primitives you reach for.

Tutorial versionWhat you ship in 8-10 hrs
×
Refresh strategy
dbt run rebuilds every model
×
Customer history
Latest customer state only
×
PR CI
Run all tests on every PR
×
Schema isolation
Single dev schema for everyone
×
Source freshness
Hope the upstream pipeline ran
×
Failure routing
Watch the GH Actions tab
Production versionParts 03-04 + ops
Refresh strategy
materialized='incremental' with unique_key='order_id' and is_incremental() guard
Customer history
dbt snapshot with strategy='check' + check_cols emitting dbt_valid_from/dbt_valid_to
PR CI
dbt run --select state:modified+ --defer --state ../ — only changed models, deferred to prod state
Schema isolation
generate_schema_name macro routes to DEV_username / STAGING / PROD
Source freshness
dbt source freshness with warn_after / error_after per source — fail CI if upstream is stale
Failure routing
on-run-end hook + Slack webhook + dbt Cloud notification channels per environment
Who this is for

Pick this if you want the first dbt project on your résumé to be a real one.

AE

Analytics engineers

You've used dbt at work but only on someone else's repo. This is the project that lets you defend grain decisions, materialization choices, and CI patterns from first principles.

DE

Junior data engineers

You can write SQL but the analytics-engineer ladder is unfamiliar. This is the cleanest first dbt project — staging → marts → snapshots → Slim CI, on a domain everyone understands.

FT

Career changers / first dbt project

You finished a SQL course and the next step is fuzzy. dbt + ecommerce + GH Actions is the portfolio piece recruiters actually open and read.

IV

Interview prep

Take-homes increasingly ship a CSV and ask for a dbt project. After this you can produce one in an afternoon and defend the modeling decisions in the followup call.

FAQ · advanced data modeling

Quick answers.

Yes. P19 is one of our free-tier projects — every part, the full repo scaffold, all 5 ShopCo seed CSVs (~10K rows total), and the certificate of completion. No card on file required. The full 8-10 hours of content is unlocked from day one.
The modeling layer is the dbt foundation — staged sources, fact tables at documented grain, conformed dimensions, SCD snapshots, tested marts. The semantic layer (Cube, MetricFlow, Lightdash, LookML) sits on top of it and exposes business metrics to BI tools. This project ships the modeling layer end-to-end. Semantic-layer tooling is intentionally out of scope; the clean output of this project is exactly what those tools want as their input.
Grain is what one row in a fact table represents. fact_orders has order-level grain — every additive measure (revenue, items_count, discount_total) sums correctly without double-counting. Going coarser (one row per customer) loses time-series granularity; going finer (one row per line-item, like fct_order_items) makes basket-level questions awkward. The right grain is the most atomic level the business actually asks questions at — and for ecommerce, that's the order.
RFM = Recency / Frequency / Monetary. Each customer is bucketed (1–5) on three axes: how recently they purchased, how often, and how much. The combined score (e.g. R5F4M5 = 'big spender, frequent, here last week') drives churn-risk segmentation and LTV cohort analysis. RFM beats raw LTV averages because it captures *behavioral* signal, not just dollar totals — a customer who spent $5K once 18 months ago looks very different from one who spends $300/quarter for 5 quarters running.
Default to full-refresh until the table is genuinely slow on full rebuilds (think: > 30s in dev, > 5 min in prod). Switch to incremental when (a) the source is append-mostly with a reliable timestamp, (b) the table is large enough that full refreshes hurt, and (c) you can define a unique_key for de-dup on late-arriving rows. fact_orders here goes incremental in part 03 because it satisfies all three; staging models stay full-refresh because they're cheap and re-typing logic shouldn't drift between runs.
Both are free analytics-track dbt projects. P03 is the broader Kimball foundation — 22 models, 4 conformed dimensions, atomic + event facts, a Slim CI gate on Postgres. P19 is the deeper analytics-engineer specialist — fewer models (17) but more production patterns: deferred-state Slim CI, dbt Cloud schedule, RFM-scored LTV, SCD2 snapshots, dev/staging/prod schema isolation, and Slack-alerted production deploys. P03 = your first warehouse end-to-end; P19 = the layer a senior analytics engineer actually owns.

Ready to build the layer everyone queries from?

Part 01 takes about 2 hours. By the end you'll have dbt + the ShopCo seeds running, 5 staging models clean, and 4 test types green across every model.

P19 · Ecommerce Analytics Modeling Layer · FREE · no cardStart part 01 →
Press Cmd+K to open