What is dbt? The Complete Guide for Data Engineers (2026)

Quick answer

dbt (data build tool) is an open-source SQL transformation framework that lets data engineers write modular SQL models, run tests, and deploy to production with version control. It sits in the transformation layer between data ingestion (Fivetran, Airbyte) and BI tools (Looker, Mode). dbt Core is free and OSS; dbt Cloud is a managed SaaS starting at $50/mo. Learn dbt hands-on at /learn/dbt or build a real warehouse with /projects/ecommerce-data-warehouse.

What is dbt?

dbt is an open-source command-line tool (and optional cloud platform) that lets data teams write SQL transformations as version-controlled code. Instead of ad-hoc SQL queries or fragile stored procedures, you define models — plain .sql files that dbt compiles and runs against your data warehouse.

dbt is SQL-first. Every transformation is a .sql file with optional Jinja templating. dbt generates the DDL/DML, manages dependencies between models, and materializes results as tables or views in your warehouse.

Critically, dbt follows the ELT pattern (Extract, Load, Transform), not ETL. It does not move data — tools like Fivetran, Airbyte, or Stitch handle ingestion. dbt only transforms data already in your warehouse.

SKILL · DBT

Master dbt in 4 hours, hands-on.

From staging models to incremental materializations, contracts, and CI testing. Builds a real medallion lakehouse from scratch.

Start learning →

Why does dbt matter?

Version control for SQL — every model lives in Git, reviewed like code
Built-in testing — assert uniqueness, nulls, referential integrity on any column
Auto-generated documentation — lineage graphs, column descriptions, a data catalog
Modular models — staging → intermediate → marts layering replaces monolithic views
Macros + Jinja — DRY up repeated SQL across hundreds of models
Incremental materialization — process only changed rows on enterprise-scale tables

How does dbt work?

A dbt project is a folder of .sql model files plus a dbt_project.yml. When you run dbt run, dbt:

Parses every model and builds a dependency DAG from ref() calls
Compiles Jinja → pure SQL targeted at your warehouse
Wraps each model in CREATE TABLE AS / MERGE / INSERT INTO DDL
Executes models in dependency order — parents first, then children
Stores run history + freshness metadata for observability

The result: your warehouse fills up with clean, tested, documented tables and views — built repeatably from SQL that lives in Git.

dbt Core vs dbt Cloud

Feature	dbt Core	dbt Cloud
License	Apache 2.0 (free)	Managed SaaS ($50+/mo)
Interface	CLI	Web IDE + CLI
Job scheduling	You wire it (Airflow, cron)	Built-in scheduler
CI/CD	You wire it (GitHub Actions)	Native PR previews
Observability	dbt artifacts only	Hosted runs + logs + alerts
Team features	None	RBAC, SSO, audit trail

Choose dbt Core if you already have an orchestrator (Airflow, Dagster, Prefect) and want zero vendor lock-in. Choose dbt Cloud when you need the hosted IDE, scheduler, and team features without building them yourself.

ELT vs ETL — why the order matters

The old ETL pattern (Extract → Transform → Load) transformed data in a separate compute layer (Informatica, Talend, Spark) before loading the transformed result into the warehouse. The new ELT pattern (Extract → Load → Transform) loads raw data first, then transforms inside the warehouse with SQL.

ELT works in 2026 because cloud warehouses (Snowflake, BigQuery, Databricks) are cheap and fast enough to do the transformation work. The benefits:

Raw data is always preserved — you can re-transform when business logic changes
Transformation is SQL, not Python or Java — every analyst can read it
Compute is decoupled from ingestion — scale them independently
Version control + testing + lineage become tractable (the dbt pattern)

dbt is the canonical ELT tool. Its existence is what made ELT the default architecture for the modern data stack.

PROJECT · ECOMMERCE-DATA-WAREHOUSE

Build a real warehouse with dbt + Snowflake.

Ship a 4-layer dbt project (raw → staging → marts) with tests, contracts, CI, and lineage. Mentor-reviewed.

Open project →

Common mistakes (and what to do instead)

Putting business logic in sources instead of models — sources should be a thin contract layer over raw tables. Logic lives downstream in models.
Not using ref() — hardcoding table names breaks lineage — every cross-model reference should go through {% raw %}{{ ref('model_name') }}{% endraw %}.
Skipping tests on critical models — every mart should have not_null + unique + relationships tests on its key columns.
Monolithic models — split into staging (rename/cast) → intermediate (joins/business logic) → marts (analytics-ready).
Using dbt as an ingestion tool — dbt only transforms. Use Airbyte/Fivetran/Stitch for ingestion.

Who is dbt for?

dbt is built for analytics engineers and data engineers who want to apply software-engineering practices to SQL: version control, code review, testing, CI/CD, modular code, documentation.

If you write SQL daily and your data lives in a cloud warehouse, dbt is almost certainly the right tool. Teams that benefit most:

Startups loading raw events from Segment / Rudderstack and modeling them for product analytics
Marketing analytics teams modeling Salesforce + HubSpot + ad-platform data into a unified customer view
Finance teams building close-of-books reporting on top of ERP ingestion
ML teams building feature pipelines that need testable, lineage-tracked transformations

Frequently asked questions

Is dbt free?

dbt Core is completely free and open-source. dbt Cloud offers a free developer tier and paid team/enterprise plans starting at $50/month per seat.

What databases does dbt support?

dbt supports all major cloud warehouses including Snowflake, BigQuery, Redshift, Databricks, DuckDB, and PostgreSQL via adapters.

Do I need to know Python to use dbt?

No. dbt is SQL-first. You write models in SQL with Jinja templating. Python models are supported in dbt Core 1.3+ but are optional.

What's the difference between dbt Core and dbt Cloud?

dbt Core is the open-source CLI you run locally or in CI. dbt Cloud adds a hosted IDE, job scheduler, observability, and team collaboration features.

Where does dbt fit in the modern data stack?

dbt sits in the transformation layer — after data ingestion tools like Fivetran or Airbyte load raw data into your warehouse, dbt transforms it into analytics-ready models.

What to do next

Start shipping.

Three steps from a guide to a job-ready portfolio. Pick one and start now — the rest will follow.

01 · LEARN

Take the skill

Self-paced module with code, exercises, and a deliverable. Free preview, paid completion.

Start S0X · Dbt →

02 · BUILD

Ship the project

Production-grade build with starter kit + mentor code review. The artifact that gets you interviews.

Open P0X · Ecommerce Data Warehouse →

03 · COMMIT

Pick a career path

The full progression — skills + projects + interview prep — for the role you actually want.

See paths →