Skip to content

Dimensional Modeling Explained

Dimensional modeling organizes a data warehouse into fact tables (what happened: revenue, quantity, clicks) and dimension tables (context: who, what, when, where). Developed by Ralph Kimball, it produces a star schema optimized for analytical queries — fast aggregations, intuitive joins, and business-friendly column names.

The Core Pattern

-- Every fact table query follows this pattern:
SELECT
    -- slice by dimensions
    c.country,
    c.customer_segment,
    d.month_name,
    p.product_category,
    -- aggregate measures from fact
    SUM(f.net_revenue)    AS total_revenue,
    COUNT(f.order_id)     AS order_count,
    AVG(f.unit_price)     AS avg_price
FROM    fct_orders           f   -- FACT: measurements
JOIN    dim_customers        c USING (customer_key)  -- DIM: who
JOIN    dim_date             d USING (date_key)       -- DIM: when
JOIN    dim_products         p USING (product_key)    -- DIM: what
GROUP BY    1, 2, 3, 4

Core Concepts

Fact Tables

Store measurements — the numeric values you aggregate. Characteristics: narrow (foreign keys + measures), tall (one row per business event), append-mostly, grain declared explicitly.

  • Additive facts — can SUM across all dimensions (revenue, quantity)
  • Semi-additive facts — sum across some dims only (account balance: sum by account, not by date)
  • Non-additive facts — can never SUM (ratios, percentages — store numerator/denominator instead)

Dimension Tables

Store context — descriptive attributes you filter and group by. Characteristics: wide (many columns), short (one row per entity), updated via SCD patterns.

  • dim_customers — name, email, country, segment, tier
  • dim_products — name, category, brand, cost, SKU
  • dim_date — date, day_of_week, month, quarter, is_holiday
  • dim_channels — source, medium, campaign, paid/organic

Grain

The grain defines what one row in the fact table represents. It must be declared before building any table — every design decision follows from it. Mixing grains in a single fact table silently produces incorrect aggregations.

Mixed: one row per order AND one row per line item in the same table
Declared: one row per order line item — primary key is order_item_id

Slowly Changing Dimension (SCD) Types

When a dimension attribute changes over time (customer moves country, changes plan), you need to decide how to handle historical accuracy.

TypeBehaviorHistoryUse case
Type 0Never updateNoneStatic lookups (country codes)
Type 1Overwrite old valueLostCorrect typos, non-historical attrs
Type 2New row + valid_from/toFullPlan changes, address history
Type 3Add "previous_value" colOne priorSimple before/after comparison

SCD Type 2 is implemented in dbt using snapshots — run dbt snapshot on a schedule to capture new rows when source data changes.

Common Mistakes

Confusing additive and non-additive facts

Never SUM a ratio or percentage — store numerator and denominator as separate additive columns and compute the ratio at query time. Storing avg_order_value as a fact and then averaging it across periods gives mathematically wrong results.

Putting business logic in dimension tables

Dimension tables should store attributes, not derived metrics. customer_lifetime_value belongs in a mart view or metrics layer, not as a column in dim_customers that silently goes stale.

No surrogate keys — using natural keys as primary keys

Natural business keys can change (customer emails, product codes). Use surrogate keys (hashed or sequential) as dimension primary keys so that SCD Type 2 rows are joinable without duplicating the natural key logic everywhere.

FAQ

What is dimensional modeling?
A warehouse design technique by Ralph Kimball. Data is organized into fact tables (measurements) and dimension tables (context) arranged in a star schema — optimized for analytical queries and BI tools.
What is the difference between a fact table and a dimension table?
Fact tables store measurements (numeric, additive, one row per event). Dimension tables store context (descriptive, one row per entity). Facts are narrow and tall; dims are wide and short.
What are the types of slowly changing dimensions?
Type 1: overwrite (no history). Type 2: new row with valid_from/to (full history — most common). Type 3: add previous_value column (one prior). dbt snapshots implement Type 2 automatically.
What is a degenerate dimension?
A dimensional attribute (like order_id or transaction_id) stored directly in the fact table because it has no other descriptive attributes. Used for drill-through queries.

Related

Press Cmd+K to open