Skip to content

How to Build a Data Model

A dimensional data model has three layers: staging (cleaned source data), dimensions (who, what, where, when), and facts (what happened, how much). Define the grain first — everything else follows from it.

1

Identify the business process and declare the grain

Start by identifying the business event: “an order was placed”, “a page was viewed”, “a payment was processed”. Then declare the grain explicitly — it goes in the model file header as a comment and in your dbt model description.

-- models/marts/fct_orders.sql
-- Business process: order placed
-- Grain: one row per ORDER LINE ITEM (not per order)
-- Primary key: order_item_id
-- Foreign keys: customer_id, product_id, date_key

-- schema.yml
models:
  - name: fct_orders
    description: >
      One row per order line item. Grain: order_item_id.
      Use for revenue, quantity, and discount analysis.
2

List dimensions and measures

Measures are numeric facts you will aggregate (SUM, AVG, COUNT). Dimensions are attributes you will filter and group by. This determines which tables you need.

Measures (in fct_)

  • • quantity
  • • unit_price
  • • gross_revenue
  • • discount_amount
  • • net_revenue

Dimensions (in dim_)

  • • dim_customers (who)
  • • dim_products (what)
  • • dim_date (when)
  • • dim_channels (how)
3

Build staging models

Staging models (stg_) clean and rename raw source columns with no business logic. One stg_ per source table.

-- models/staging/stg_orders.sql
SELECT
    id              AS order_item_id,
    order_id,
    cust_id         AS customer_id,
    prod_id         AS product_id,
    qty             AS quantity,
    price           AS unit_price,
    disc            AS discount_amount,
    created_at      AS ordered_at
FROM    {{ source('app_db', 'order_items') }}
4

Build dimension tables

Dimension tables are wide, flat, and denormalized — every attribute an analyst might want to filter or group by is in one table. Use dbt_utils.generate_surrogate_key() for stable primary keys.

-- models/marts/dim_customers.sql
SELECT
    {{ dbt_utils.generate_surrogate_key(['customer_id']) }}  AS customer_key,
    customer_id,
    full_name,
    email,
    country,
    customer_segment,
    first_order_date,
    lifetime_revenue
FROM    {{ ref('stg_customers') }}
5

Build the fact table and add dbt tests

-- models/marts/fct_orders.sql
SELECT
    o.order_item_id,
    o.order_id,
    o.ordered_at,
    c.customer_key,
    p.product_key,
    d.date_key,
    o.quantity,
    o.unit_price,
    o.quantity * o.unit_price            AS gross_revenue,
    o.discount_amount,
    o.quantity * o.unit_price
      - o.discount_amount               AS net_revenue
FROM    {{ ref('stg_orders') }}  o
LEFT JOIN  {{ ref('dim_customers') }}  c USING (customer_id)
LEFT JOIN  {{ ref('dim_products') }}   p USING (product_id)
LEFT JOIN  {{ ref('dim_date') }}       d ON d.date_day = o.ordered_at::date

-- schema.yml tests:
-- unique + not_null on order_item_id
-- relationships: customer_key → dim_customers.customer_key
-- relationships: product_key  → dim_products.product_key

When to Use This Pattern

  • • Building an analytics warehouse from scratch with dbt
  • • Replacing ad-hoc analyst queries with reusable, tested models
  • • Creating a single source of truth for key business metrics
  • • Enabling self-serve analytics via BI tools (Looker, Metabase)

Common Issues

Fan-out: joining a fact to a higher-grain dim

If your fact has order-line grain but you join a dim with order grain (one row per order), every line item will duplicate the order-level values. Always match the dimension grain to the fact grain or use aggregation before joining.

Fan-in: joining dims that reduce row count

If a dimension has a many-to-one relationship (multiple orders per customer) and you join incorrectly, rows get dropped. Ensure foreign key cardinality matches expectations with a relationships test.

Forgetting to add NOT NULL tests on fact keys

A NULL customer_key in fct_orders means that order is invisible in customer-level reports. Always add not_null tests on all foreign key columns in fact tables.

FAQ

What is the grain of a fact table?
The grain defines what one row represents — the most atomic level of detail. Declare it explicitly (e.g., "one row per order line item") to prevent mixing detail levels, which causes double-counting.
What is a surrogate key?
A system-generated unique identifier for a dimension row, separate from the natural business key. Created with dbt_utils.generate_surrogate_key(). Enables stable joins even when business keys change.
What is a slowly changing dimension (SCD)?
A dimension that changes over time (e.g., customer moves country). SCD Type 2 adds a new row with valid_from/valid_to timestamps. dbt snapshots implement SCD Type 2 automatically.

Related

Press Cmd+K to open