Skip to content

dbt Models Explained

A dbt model is a single .sql file containing a SELECT statement. dbt compiles it, wraps it in the appropriate DDL (CREATE TABLE or CREATE VIEW), and runs it against your warehouse. Models reference each other using ref(), which builds a dependency graph dbt executes in order.

What a dbt Model Looks Like

-- models/staging/stg_orders.sql
with source as (
    select * from {{ source('ecommerce', 'raw_orders') }}
),
renamed as (
    select
        id          as order_id,
        user_id,
        status,
        created_at
    from source
)
select * from renamed

The 3-Layer Model Structure

Staging

One model per source table. Rename columns, cast types, minimal transformation.

stg_
Intermediate

Business logic, joins between staging models, complex transformations.

int_
Marts

Final analytics-ready tables consumed by BI tools and analysts.

fct_dim_

Materializations

What does dbt do with your SELECT statement?

  • view(default)Runs as a SQL view, no data stored, always up-to-date
  • tableCreates a physical table, faster for BI, re-created on each run
  • incrementalAppends or merges only new rows — efficient for large tables
  • ephemeralNot materialized, inlined as a CTE in downstream models

Incremental config example

{{ config(materialized='incremental', unique_key='order_id') }}

select * from {{ ref('stg_orders') }}
where created_at > (select max(created_at) from {{ this }})

ref() Explained

ref() is dbt's dependency system. When you write {{ ref('stg_orders') }}, dbt knows to run stg_orders before the current model. This builds a DAG (directed acyclic graph) that dbt executes in the correct order every time.

When to Use This

  • Building your first dbt project
  • Deciding how to structure staging vs mart models
  • Choosing the right materialization for performance

Common Mistakes

  • Skipping the staging layer — putting raw source tables directly in marts
  • Over-using table materialization — most models should be view until performance requires otherwise
  • Not using ref() — hardcoding table names breaks lineage and environment isolation
  • Giant monolithic models — break complex logic into intermediate models

Related

Press Cmd+K to open