Data Modeling & Architecture

Name: Data Modeling & Architecture
Author: AI-DE Engineering Team

Dimensional modeling, Kimball methodology, and cloud warehouse design.

Every data engineering interview starts with schema design. This is the bridge from SQL fluency to your first real role.

What you’ll be able to do

Design star schemas and dimensional models using Kimball methodology
Implement slowly changing dimensions (SCD Type 1 & 2)
Build production-ready models in dbt with proper layering
Prepare for data modeling interview questions

Curriculum

Phase 1: Design Your First Analytics Schema

Fundamentals, build your first model

Build Your First Star Schema

What a data model actually is, your first star schema (fact + dimensions), your first analytical queries on it, and the moment you see the model drive a real business output.

Modeling Foundations

The mental models behind dimensional design: grain, conformed dimensions, surrogate vs natural keys, and the patterns that separate a junior schema from a maintainable one.

Phase 2: Real-World Modeling Patterns

SCDs, history tracking, dbt-ready pipelines

Dimensional Modeling

Kimball methodology end-to-end: star vs snowflake, fact table types (transaction / periodic / accumulating snapshot), dimension types, and the trade-offs every interviewer probes.

Date Dimensions & SCDs

Building a date dimension, SCD Type 2 history tracking, SCD with dbt snapshots, and the decision framework for when to overwrite vs preserve history.

Phase 3: Production & Scale

Advanced patterns, real-world systems, interview prep

Modeling in Pipelines

Where models live in dbt — staging, intermediate, mart layers — dependency graphs, model lifecycle, and the layering pattern every analytics-engineering team enforces in review.

Advanced Patterns

Many-to-many relationships, bridge tables, shared and role-playing dimensions, and the performance-optimization moves (clustering, partitioning, pre-aggregation) that keep queries fast at scale.

Real-World Systems

Domain-specific models across e-commerce, SaaS, financial, healthcare, and IoT — how the same Kimball patterns adapt to wildly different business questions in production.

AI Data Modeling

Semantic models for LLMs, embedding storage schemas, vector-ready models, knowledge graph patterns, and the RAG data architecture that connects classical models to AI workloads.

Interview & Career Prep

Whiteboard schema-design walk-throughs, trade-off framing under pressure, portfolio packaging, and the 30+ modeling interview questions companies actually ask.

What you’ll build

A working star schema with fact + conformed dimensions you can defend on a whiteboard
SCD Type 2 history tracking implemented with dbt snapshots
A staging → intermediate → mart dbt project laid out the way reviewers expect
An interview-ready portfolio of 5+ real schemas across e-commerce, SaaS, and AI domains

Junior engineers fail data-modeling interviews… on the same three questions, every time.

Without a real model under your belt, you risk:

Freezing on the whiteboard prompt "design a schema for X" and flattening everything into one giant table
Not knowing SCD Type 1 vs Type 2 — overwriting history and losing the audit trail interviewers ask about
Skipping the "what is the grain?" question and building a fact table that double-counts revenue
Memorizing the word "star schema" but never having queried one, which the technical round catches in 90 seconds

What is Data Modeling & Architecture?

Data modeling is the practice of designing how data is structured, stored, and related inside databases and warehouses. For data engineers, this means dimensional modeling with Kimball methodology — star schemas, fact and dimension tables, slowly changing dimensions, and the staging-to-mart layering used in modern dbt projects. Strong data modeling is the single most-tested concept in data-engineering interviews and the foundation of every analytics warehouse.

Why this matters in production

Almost every data-engineering interview opens with a modeling question: design a schema for an e-commerce app, a ride-share product, a SaaS billing system. The juniors who can talk grain, fact-vs-dimension, and SCD choices land the role. The ones who flatten everything into one table or skip the grain question don't. On the job, the same skills decide whether your dashboards return consistent metrics or quietly double-count revenue every Monday.

Common use cases

Designing star schemas and snowflake schemas for analytics warehouses
Implementing slowly changing dimensions (SCD Type 1, 2, and 3)
Building staging-to-mart model layers in dbt for organized transformations
Modeling event data for real-time and batch analytics
Preparing for data modeling interview questions at top companies
Designing schemas that support both historical analysis and real-time dashboards

Data Modeling vs alternatives

Data Modeling vs Normalized (3NF)

Dimensional models optimize for query performance and analyst usability. Normalized models minimize redundancy for transactional systems. Data warehouses almost always use dimensional models.

Data Modeling vs Data Vault

Kimball dimensional modeling is simpler and faster to query. Data Vault handles complex source system integration better. Many teams use Data Vault for raw integration and Kimball for analytics layers.

Data Modeling vs One Big Table

Dimensional models provide clear structure, reusable dimensions, and manageable complexity. One Big Table is faster to build but creates maintenance nightmares and inconsistent metrics as the team grows.

Related skills

Data models are queried and built using SQL skills from SQL Mastery.
Dimensional models are implemented and tested in production using dbt & Analytics Engineering.
Model performance depends on warehouse internals covered in Data Warehouse Internals.

Why this skill matters

Data modeling is the bridge from SQL fluency to your first data-engineering role. Interviewers test it at every level — junior, mid, and senior — because it reveals whether you can think in business logic and trade-offs, not just write SELECT statements. This curriculum gets you fluent enough to whiteboard a star schema, defend SCD choices, and pass the modeling rounds at companies that take modeling seriously.

Common questions about Data Modeling

What is data modeling in data engineering?

Data modeling defines how data is organized in warehouses and databases. Data engineers design schemas that optimize query performance, ensure metric consistency, and support evolving business requirements.

What data modeling questions do interviewers actually ask?

Expect three: "design a schema for X" (whiteboard a fact + dimensions for a business domain), "what is the grain of your fact table?" (proves you can avoid double-counting), and "how would you track history on this dimension?" (probes SCD knowledge). All three appear in Phase 3 of this curriculum.

Is Kimball methodology still used in 2026?

Yes. Kimball dimensional modeling remains the foundation of most analytics warehouses. Modern tools like dbt and Iceberg build on Kimball principles with added flexibility for streaming and AI workloads.

How long does it take to learn data modeling?

Foundational concepts take 2-3 weeks. Mastering production patterns like SCDs, advanced joins, and performance-optimized schemas typically takes 2-3 months of hands-on practice.

Do data engineers need data modeling skills?

Absolutely. Data modeling is tested in interviews and used daily. Engineers who cannot model data correctly create slow, unreliable pipelines that break downstream analytics.

What is a slowly changing dimension?

A slowly changing dimension (SCD) tracks how dimension attributes change over time. Type 1 overwrites old values, Type 2 creates new rows with history, and Type 3 adds columns for previous values.

Star schema vs snowflake schema?

Star schemas denormalize dimensions for simpler queries. Snowflake schemas normalize dimensions to reduce storage. Most modern warehouses prefer star schemas because storage is cheap and query simplicity matters more.

ai-de.net/Learn/Data Modeling & Architecture

AnalyticsIncluded in Free

Data Modeling & Architecture

Dimensional modeling, Kimball methodology, and cloud warehouse design.

Last updated 2026-05-22By AI-DE Engineering Team

Every data engineering interview starts with schema design. This is the bridge from SQL fluency to your first real role.

Phases

Modules

Time

~60h video + labs

Continue Learning View phases

Jump to:P1Design Your First Analytics Schema P2Real-World Modeling Patterns P3Production & Scale

What you'll do

What you'll be able to do.

Design star schemas and dimensional models using Kimball methodology
Implement slowly changing dimensions (SCD Type 1 & 2)
Build production-ready models in dbt with proper layering
Prepare for data modeling interview questions

Phase roadmap.

Phase 1PRO REQUIRED

Design Your First Analytics Schema

Fundamentals, build your first model

1.1

✓Build Your First Star Schema

What a data model actually is, your first star schema (fact + dimensions), your first analytical queries on it, and the moment you see the model drive a real business output.

Open →

1.2

✓Modeling Foundations

The mental models behind dimensional design: grain, conformed dimensions, surrogate vs natural keys, and the patterns that separate a junior schema from a maintainable one.

Open →

Used in:P03 — Commerce data warehouse (FREE)P19 — E-commerce metrics layer (FREE)

Start Phase 1 →

Phase 2PRO REQUIRED

Real-World Modeling Patterns

SCDs, history tracking, dbt-ready pipelines

2.1

✓Dimensional Modeling

Kimball methodology end-to-end: star vs snowflake, fact table types (transaction / periodic / accumulating snapshot), dimension types, and the trade-offs every interviewer probes.

Open →

2.2

✓Date Dimensions & SCDs

Building a date dimension, SCD Type 2 history tracking, SCD with dbt snapshots, and the decision framework for when to overwrite vs preserve history.

Open →

Used in:P03 — Commerce data warehouse (FREE)P21 — Modern data stack (PRO)

Start Phase 2 →

Phase 3PRO REQUIRED

Production & Scale

Advanced patterns, real-world systems, interview prep

3.1

✓Modeling in Pipelines

Where models live in dbt — staging, intermediate, mart layers — dependency graphs, model lifecycle, and the layering pattern every analytics-engineering team enforces in review.

Open →

3.2

✓Advanced Patterns

Many-to-many relationships, bridge tables, shared and role-playing dimensions, and the performance-optimization moves (clustering, partitioning, pre-aggregation) that keep queries fast at scale.

Open →

3.3

✓Real-World Systems

Domain-specific models across e-commerce, SaaS, financial, healthcare, and IoT — how the same Kimball patterns adapt to wildly different business questions in production.

Open →

3.4

✓AI Data Modeling

Semantic models for LLMs, embedding storage schemas, vector-ready models, knowledge graph patterns, and the RAG data architecture that connects classical models to AI workloads.

Open →

3.5

✓Interview & Career Prep

Whiteboard schema-design walk-throughs, trade-off framing under pressure, portfolio packaging, and the 30+ modeling interview questions companies actually ask.

Open →

Used in:P03 — Commerce data warehouse (FREE)P04 — Iceberg lakehouse (PRO)

Start Phase 3 →

Junior engineers fail data-modeling interviews… on the same three questions, every time.

Without a real model under your belt, you risk:

Freezing on the whiteboard prompt "design a schema for X" and flattening everything into one giant table
Not knowing SCD Type 1 vs Type 2 — overwriting history and losing the audit trail interviewers ask about
Skipping the "what is the grain?" question and building a fact table that double-counts revenue
Memorizing the word "star schema" but never having queried one, which the technical round catches in 90 seconds

Start with Phase 1

What you'll ship

What you'll build.

A working star schema with fact + conformed dimensions you can defend on a whiteboard
SCD Type 2 history tracking implemented with dbt snapshots
A staging → intermediate → mart dbt project laid out the way reviewers expect
An interview-ready portfolio of 5+ real schemas across e-commerce, SaaS, and AI domains

Definition

What is Data Modeling & Architecture?

Production context

Why this matters in production.

Use cases

Common use cases.

Designing star schemas and snowflake schemas for analytics warehouses
Implementing slowly changing dimensions (SCD Type 1, 2, and 3)
Building staging-to-mart model layers in dbt for organized transformations
Modeling event data for real-time and batch analytics
Preparing for data modeling interview questions at top companies
Designing schemas that support both historical analysis and real-time dashboards

Compare

Data Modeling vs alternatives.

Data ModelingvsNormalized (3NF)

Dimensional models optimize for query performance and analyst usability. Normalized models minimize redundancy for transactional systems. Data warehouses almost always use dimensional models.

Data ModelingvsData Vault

Data ModelingvsOne Big Table

Related curriculum

Related skills.

Build with this skill

Build real systems.

Commerce Data Warehouse E-commerce Analytics Platform

Before you start

Before you start.

Tech stack

Kimball
Star Schema
SCDs
dbt
BigQuery

Prerequisites

SQL proficiency
Basic dbt knowledge helpful

Why this matters

Why this skill matters.

FAQ

Common questions about Data.

Data Modeling & ArchitectureStart Phase 1