Dimensional Modeling Explained
Dimensional modeling organizes a data warehouse into fact tables (what happened: revenue, quantity, clicks) and dimension tables (context: who, what, when, where). Developed by Ralph Kimball, it produces a star schema optimized for analytical queries — fast aggregations, intuitive joins, and business-friendly column names.
The Core Pattern
-- Every fact table query follows this pattern:
SELECT
-- slice by dimensions
c.country,
c.customer_segment,
d.month_name,
p.product_category,
-- aggregate measures from fact
SUM(f.net_revenue) AS total_revenue,
COUNT(f.order_id) AS order_count,
AVG(f.unit_price) AS avg_price
FROM fct_orders f -- FACT: measurements
JOIN dim_customers c USING (customer_key) -- DIM: who
JOIN dim_date d USING (date_key) -- DIM: when
JOIN dim_products p USING (product_key) -- DIM: what
GROUP BY 1, 2, 3, 4
Core Concepts
Fact Tables
Store measurements — the numeric values you aggregate. Characteristics: narrow (foreign keys + measures), tall (one row per business event), append-mostly, grain declared explicitly.
- • Additive facts — can SUM across all dimensions (revenue, quantity)
- • Semi-additive facts — sum across some dims only (account balance: sum by account, not by date)
- • Non-additive facts — can never SUM (ratios, percentages — store numerator/denominator instead)
Dimension Tables
Store context — descriptive attributes you filter and group by. Characteristics: wide (many columns), short (one row per entity), updated via SCD patterns.
- • dim_customers — name, email, country, segment, tier
- • dim_products — name, category, brand, cost, SKU
- • dim_date — date, day_of_week, month, quarter, is_holiday
- • dim_channels — source, medium, campaign, paid/organic
Grain
The grain defines what one row in the fact table represents. It must be declared before building any table — every design decision follows from it. Mixing grains in a single fact table silently produces incorrect aggregations.
Slowly Changing Dimension (SCD) Types
When a dimension attribute changes over time (customer moves country, changes plan), you need to decide how to handle historical accuracy.
| Type | Behavior | History | Use case |
|---|---|---|---|
| Type 0 | Never update | None | Static lookups (country codes) |
| Type 1 | Overwrite old value | Lost | Correct typos, non-historical attrs |
| Type 2 | New row + valid_from/to | Full | Plan changes, address history |
| Type 3 | Add "previous_value" col | One prior | Simple before/after comparison |
SCD Type 2 is implemented in dbt using snapshots — run dbt snapshot on a schedule to capture new rows when source data changes.
Common Mistakes
Confusing additive and non-additive facts
Never SUM a ratio or percentage — store numerator and denominator as separate additive columns and compute the ratio at query time. Storing avg_order_value as a fact and then averaging it across periods gives mathematically wrong results.
Putting business logic in dimension tables
Dimension tables should store attributes, not derived metrics. customer_lifetime_value belongs in a mart view or metrics layer, not as a column in dim_customers that silently goes stale.
No surrogate keys — using natural keys as primary keys
Natural business keys can change (customer emails, product codes). Use surrogate keys (hashed or sequential) as dimension primary keys so that SCD Type 2 rows are joinable without duplicating the natural key logic everywhere.
FAQ
- What is dimensional modeling?
- A warehouse design technique by Ralph Kimball. Data is organized into fact tables (measurements) and dimension tables (context) arranged in a star schema — optimized for analytical queries and BI tools.
- What is the difference between a fact table and a dimension table?
- Fact tables store measurements (numeric, additive, one row per event). Dimension tables store context (descriptive, one row per entity). Facts are narrow and tall; dims are wide and short.
- What are the types of slowly changing dimensions?
- Type 1: overwrite (no history). Type 2: new row with valid_from/to (full history — most common). Type 3: add previous_value column (one prior). dbt snapshots implement Type 2 automatically.
- What is a degenerate dimension?
- A dimensional attribute (like order_id or transaction_id) stored directly in the fact table because it has no other descriptive attributes. Used for drill-through queries.