What is Data Modeling? (2026)

Quick answer

Data modeling is the process of defining how data is structured, stored, and related within a database or data warehouse. It specifies tables, columns, data types, relationships, and — most critically — the grain (what one row represents). The most common approach for analytics is dimensional modeling: fact tables (measurements) joined to dimension tables (context), arranged in a star schema. Learn data modeling hands-on at /learn/data-modeling or build /projects/ecommerce-metrics-layer.

What is data modeling?

Every table in your warehouse is a data model. The question is whether it was designed intentionally or just dumped from a source system. A well-designed data model answers three questions: what does one row represent (the grain), what can you measure (fact columns: revenue, quantity, duration), and how can you slice it (dimension columns: customer, product, date, region).

Data modeling sits between raw source data (staging) and the analytics layer (marts). In the modern stack it is typically implemented in dbt — SQL transformations that produce clean, tested, documented models in your warehouse.

The dominant pattern for analytics is dimensional modeling, invented by Ralph Kimball in the 1990s and still the default for BI and self-serve analytics in 2026. The alternative — data vault — is built for enterprise environments that must integrate many source systems with full audit trails, but it almost always ships a star schema mart layer on top so analysts can actually query it.

SKILL · DATA-MODELING

Master data modeling in 5 hours, hands-on.

From grain declarations to SCD Type 2 dimensions, semantic layers, and incremental fact tables — implemented in dbt against a real warehouse.

Start learning →

Why does data modeling matter?

Analysts query fct_orders and get answers in 3 lines, not 50
Single source of truth — one definition of revenue across finance and growth
Models are tested and documented in dbt — joins are safe by design
Grain is declared explicitly — no silent double-counting in dashboards
New analysts onboard in hours instead of weeks
Source schema changes are absorbed in staging, not propagated to BI

How does data modeling work?

In the modern data stack, data modeling is a four-stage process: raw source data lands in staging tables (rename, cast, light cleanup), transformation logic produces intermediate models (joins, business logic), final fact and dimension tables are materialized as marts, and a semantic layer (metrics definitions) sits on top for BI tools.

A typical fact table is narrow and tall — one row per atomic event with a few measures and many foreign keys to dimensions:

-- models/marts/fct_orders.sql
SELECT
    -- grain: one row per order line item
    o.order_id,
    o.order_item_id,
    o.ordered_at,
    -- foreign keys → dimensions
    o.customer_id,
    o.product_id,
    d.date_day AS order_date_key,
    -- measures (additive facts)
    o.quantity,
    o.unit_price,
    o.quantity * o.unit_price AS gross_revenue
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('dim_date') }} d
    ON d.date_day = o.ordered_at::date

Dimension tables are wide and short — context columns describing the business entity, often with slowly changing dimension (SCD) tracking for historical attribute changes.

Star schema vs snowflake schema

Dimension	Star schema	Snowflake schema
Dimension shape	Flat, denormalized	Normalized into sub-tables
Joins per query	1 hop (fact → dim)	Multiple hops (fact → dim → sub-dim)
Query speed	Fast	Slower (more joins)
Storage cost	Slightly higher	Lower
Analyst friendliness	High — easy self-serve	Lower — needs explanation
Default in modern warehouses	Yes	Rare

Verdict: star schema for analytics and BI. Storage is cheap in modern cloud warehouses (Snowflake, BigQuery, Databricks) — avoid snowflake schema unless disk cost is a genuine constraint, which it almost never is in 2026.

Dimensional modeling vs data vault

Dimensional modeling is analyst-friendly by design — fact and dimension tables map directly to how business users think about data. It is fast to build, fast to query, and ergonomic for BI tools. The tradeoff: it is harder to extend when you add new source systems or need full historical audit trails.

Data vault solves the multi-source problem. Hubs store unique business keys, Links capture relationships between hubs, and Satellites hold descriptive attributes with full history. The result is load-flexible (you can add new sources without refactoring) and fully auditable — but the schema is verbose and hard to query directly. Almost every data vault deployment adds a star schema mart layer on top so analysts can self-serve.

Pick dimensional modeling for most teams. Pick data vault for regulated industries (finance, healthcare) or large enterprises with 10+ source systems requiring full audit trails — and plan for the mart layer on top.

What you can build with data modeling

A solid dimensional model unlocks a long list of analytics use cases that would otherwise require fragile, query-by-query SQL:

Revenue reporting — fct_orders + dim_customers + dim_products as the single source of truth for finance and growth
Product analytics — fct_events + dim_users for funnel analysis, retention cohorts, and feature adoption
Marketing attribution — fct_sessions + fct_conversions + dim_campaigns for multi-touch attribution
Inventory and supply chain — fct_inventory_snapshots with SCD dimensions for stock levels across warehouses
Customer 360 — a wide dim_customers table combining CRM, billing, and support into a single customer record
Financial consolidation — fct_general_ledger + dim_accounts + dim_cost_centers for P&L and balance sheet rollups

PROJECT · ECOMMERCE-METRICS-LAYER

Build a real metrics layer with dbt + Snowflake.

Ship a star schema with fct_orders, SCD Type 2 dimensions, a semantic layer, and dbt tests for grain and referential integrity. Mentor-reviewed.

Open project →

Common mistakes (and what to do instead)

Mixing grains in a fact table — putting order-level and order-line-level rows in the same table produces silent double-counting. Declare the grain explicitly in a model comment and enforce one fact table = one grain.
Modeling source tables, not business processes — copying raw OLTP tables into a warehouse is not data modeling. Dimensional models are built around business processes (an order was placed), not source schemas.
Not testing for duplicates and referential integrity — every fact table needs unique + not_null tests on its primary key, and relationships tests on every foreign key. Without these, silent data quality failures go undetected.
Skipping the semantic layer — if the definition of "active customer" or "net revenue" lives inside a BI tool query, every analyst defines it differently. Define metrics once (dbt metrics, MetricFlow) and let tools query it.
Premature OBT (One Big Table) — flattening everything into a wide table feels fast but kills maintainability. Start with normalized fct + dim and add OBT views only for high-traffic queries where join cost is measurable.

Who is data modeling for?

Data modeling is the core discipline of analytics engineers and a critical skill for data engineers and senior analysts. If your job touches SQL inside a warehouse, you benefit from understanding how fct_ and dim_ tables are designed.

Teams that benefit most:

Analytics engineers writing dbt models — this is the framework that turns SQL into a maintainable codebase
Data engineers building ingestion pipelines — understanding downstream models helps you stage data in the right shape
Data analysts querying the warehouse daily — knowing the grain and join keys makes SQL faster and metrics correct
BI developers building Looker, Tableau, or Mode dashboards on top of a star schema
ML engineers building feature pipelines — clean fact tables with declared grain are the foundation of training datasets

Frequently asked questions

What is data modeling?

Data modeling is the process of defining how data is structured, organized, and related within a database or data warehouse. A data model specifies the tables (or entities), their columns, data types, relationships (joins), and the grain (what one row represents). Good data models make queries fast, results trustworthy, and downstream logic simple.

What is dimensional modeling?

Dimensional modeling is a data warehouse design technique invented by Ralph Kimball. It organizes data into fact tables (measurements like revenue, clicks, orders) and dimension tables (context like customers, products, dates). The result is a star schema optimized for analytical queries — fast aggregations, intuitive joins, and business-friendly column names.

What is the difference between a star schema and a snowflake schema?

A star schema has one layer of dimension tables directly joined to the fact table — simple, fast, and easy to query. A snowflake schema normalizes dimensions further into sub-dimensions, reducing storage but increasing join complexity. Most modern analytics warehouses prefer star schema because storage is cheap and join performance is excellent.

What is data vault modeling?

Data vault is a modeling approach designed for enterprise warehouses that need to integrate data from many source systems. It uses three entity types: Hubs (unique business keys), Links (relationships between hubs), and Satellites (descriptive attributes with full history). Data vault is highly auditable but complex to query — most teams add a star schema mart layer on top for analysts.

What is grain in data modeling?

The grain of a fact table defines what one row represents — the most atomic level of detail captured. For example: one row per order line item (not per order), or one row per daily user-product impression (not per session). Mixing grains in a single fact table is the most common data modeling mistake — it produces incorrect aggregations and silent double-counting bugs.

What to do next

Start shipping.

Three steps from a guide to a job-ready portfolio. Pick one and start now — the rest will follow.

01 · LEARN

Take the skill

Self-paced module with code, exercises, and a deliverable. Free preview, paid completion.

Start S0X · Data Modeling →

02 · BUILD

Ship the project

Production-grade build with starter kit + mentor code review. The artifact that gets you interviews.

Open P0X · Ecommerce Metrics Layer →

03 · COMMIT

Pick a career path

The full progression — skills + projects + interview prep — for the role you actually want.

See paths →

What is data modeling?

Master data modeling in 5 hours, hands-on.

Why does data modeling matter?

How does data modeling work?

Star schema vs snowflake schema

Dimensional modeling vs data vault

What you can build with data modeling

Build a real metrics layer with dbt + Snowflake.

Common mistakes (and what to do instead)

Who is data modeling for?

Frequently asked questions

Start shipping.

Take the skill

Ship the project

Pick a career path

Related guides

What is dbt? The complete guide for data engineers

What is a Data Contract?

What is DataOps?