Build Your First Star Schema
What a data model actually is, your first star schema (fact + dimensions), your first analytical queries on it, and the moment you see the model drive a real business output.
Dimensional modeling, Kimball methodology, and cloud warehouse design.
Every data engineering interview starts with schema design. This is the bridge from SQL fluency to your first real role.
Fundamentals, build your first model
What a data model actually is, your first star schema (fact + dimensions), your first analytical queries on it, and the moment you see the model drive a real business output.
The mental models behind dimensional design: grain, conformed dimensions, surrogate vs natural keys, and the patterns that separate a junior schema from a maintainable one.
SCDs, history tracking, dbt-ready pipelines
Kimball methodology end-to-end: star vs snowflake, fact table types (transaction / periodic / accumulating snapshot), dimension types, and the trade-offs every interviewer probes.
Building a date dimension, SCD Type 2 history tracking, SCD with dbt snapshots, and the decision framework for when to overwrite vs preserve history.
Advanced patterns, real-world systems, interview prep
Where models live in dbt — staging, intermediate, mart layers — dependency graphs, model lifecycle, and the layering pattern every analytics-engineering team enforces in review.
Many-to-many relationships, bridge tables, shared and role-playing dimensions, and the performance-optimization moves (clustering, partitioning, pre-aggregation) that keep queries fast at scale.
Domain-specific models across e-commerce, SaaS, financial, healthcare, and IoT — how the same Kimball patterns adapt to wildly different business questions in production.
Semantic models for LLMs, embedding storage schemas, vector-ready models, knowledge graph patterns, and the RAG data architecture that connects classical models to AI workloads.
Whiteboard schema-design walk-throughs, trade-off framing under pressure, portfolio packaging, and the 30+ modeling interview questions companies actually ask.
Without a real model under your belt, you risk:
Data modeling is the practice of designing how data is structured, stored, and related inside databases and warehouses. For data engineers, this means dimensional modeling with Kimball methodology — star schemas, fact and dimension tables, slowly changing dimensions, and the staging-to-mart layering used in modern dbt projects. Strong data modeling is the single most-tested concept in data-engineering interviews and the foundation of every analytics warehouse.
Almost every data-engineering interview opens with a modeling question: design a schema for an e-commerce app, a ride-share product, a SaaS billing system. The juniors who can talk grain, fact-vs-dimension, and SCD choices land the role. The ones who flatten everything into one table or skip the grain question don't. On the job, the same skills decide whether your dashboards return consistent metrics or quietly double-count revenue every Monday.
Dimensional models optimize for query performance and analyst usability. Normalized models minimize redundancy for transactional systems. Data warehouses almost always use dimensional models.
Kimball dimensional modeling is simpler and faster to query. Data Vault handles complex source system integration better. Many teams use Data Vault for raw integration and Kimball for analytics layers.
Dimensional models provide clear structure, reusable dimensions, and manageable complexity. One Big Table is faster to build but creates maintenance nightmares and inconsistent metrics as the team grows.
Data modeling is the bridge from SQL fluency to your first data-engineering role. Interviewers test it at every level — junior, mid, and senior — because it reveals whether you can think in business logic and trade-offs, not just write SELECT statements. This curriculum gets you fluent enough to whiteboard a star schema, defend SCD choices, and pass the modeling rounds at companies that take modeling seriously.
Data modeling defines how data is organized in warehouses and databases. Data engineers design schemas that optimize query performance, ensure metric consistency, and support evolving business requirements.
Expect three: "design a schema for X" (whiteboard a fact + dimensions for a business domain), "what is the grain of your fact table?" (proves you can avoid double-counting), and "how would you track history on this dimension?" (probes SCD knowledge). All three appear in Phase 3 of this curriculum.
Yes. Kimball dimensional modeling remains the foundation of most analytics warehouses. Modern tools like dbt and Iceberg build on Kimball principles with added flexibility for streaming and AI workloads.
Foundational concepts take 2-3 weeks. Mastering production patterns like SCDs, advanced joins, and performance-optimized schemas typically takes 2-3 months of hands-on practice.
Absolutely. Data modeling is tested in interviews and used daily. Engineers who cannot model data correctly create slow, unreliable pipelines that break downstream analytics.
A slowly changing dimension (SCD) tracks how dimension attributes change over time. Type 1 overwrites old values, Type 2 creates new rows with history, and Type 3 adds columns for previous values.
Star schemas denormalize dimensions for simpler queries. Snowflake schemas normalize dimensions to reduce storage. Most modern warehouses prefer star schemas because storage is cheap and query simplicity matters more.