What is dbt?
The complete guide for data engineers — what it does, how it works, and when to use it.
Quick Answer
dbt (data build tool) is an open-source SQL transformation framework used by data engineers and analytics engineers to build modular, tested, and version-controlled data models inside a data warehouse. It does not move data — it transforms data that's already been loaded, following the ELT pattern. dbt is maintained by dbt Labs and is the industry standard for analytics engineering.
What is dbt?
dbt is an open-source command-line tool (and optional cloud platform) that lets data teams write SQL transformations as version-controlled code. Instead of writing ad-hoc SQL queries or building fragile stored procedures, you define models — plain SQL files that dbt compiles and runs against your data warehouse.
dbt is SQL-first. Every transformation is a .sql file with optional Jinja templating. dbt generates the DDL/DML, manages dependencies between models, and materializes results as tables or views in your warehouse.
Critically, dbt follows the ELT pattern (Extract, Load, Transform), not ETL. It does not move data — tools like Fivetran, Airbyte, or Stitch handle ingestion. dbt only transforms data that has already landed in your warehouse.
The open-source CLI. Run it locally, in Docker, or in CI/CD. Full feature set with no restrictions. The foundation that dbt Cloud is built on.
Managed SaaS by dbt Labs. Adds a browser IDE, job scheduler, CI/CD integrations, model observability, and team collaboration. Free developer tier available.
Why dbt matters:
- Version control for SQL — every model lives in Git, reviewed like code
- Built-in testing — assert uniqueness, nulls, referential integrity on any column
- Auto-generated documentation — lineage graphs, column descriptions, and data catalog
- Dependency management — dbt resolves model dependencies automatically via ref()
- Team collaboration — SQL organized in folders, reviewed in PRs, deployed in CI
Why dbt matters
Before dbt
- Messy SQL scripts with no structure
- No version control for transformations
- Hard to test data quality
- Duplicated logic across reports
- Tribal knowledge — no documentation
With dbt
- Modular SQL organized in folders
- Git-based version control and code reviews
- Automated data quality tests
- Reusable macros and packages
- Auto-generated docs and lineage graphs
What you can do with dbt
How dbt works
The ELT flow with dbt looks like this: raw data is loaded into your warehouse first (via Fivetran, Airbyte, Stitch, or custom pipelines), then dbt transforms it in-warehouse into clean, analytics-ready tables.
Three core concepts:
1. Models
Each .sql file in your models/ directory is a model. dbt runs them in dependency order and materializes results as tables or views.
2. Tests
Define data quality assertions in YAML. Built-in tests include unique, not_null, accepted_values, and relationships. Custom tests can be written in SQL.
3. Documentation
dbt auto-generates a documentation site from your YAML descriptions, with lineage DAGs showing how every model relates to its sources and downstream consumers.
models/staging/stg_orders.sql
-- models/staging/stg_orders.sql
with source as (
select * from {{ source('ecommerce', 'raw_orders') }}
),
renamed as (
select
id as order_id,
user_id,
status,
created_at
from source
)
select * from renamedmodels/staging/schema.yml
# schema.yml
models:
- name: stg_orders
columns:
- name: order_id
tests:
- unique
- not_nullCommon mistakes with dbt
- Putting business logic in sources instead of models
- Not using ref() — hardcoding table names breaks lineage
- Skipping tests on critical models
- Monolithic models instead of modular staging → intermediate → mart layers
- Using dbt as an ingestion tool (it only transforms, use Airbyte/Fivetran for ingestion)
Who should learn dbt?
Junior Data Engineers
Learn basic transformations, understand data modeling, build your first production pipeline.
Senior Data Engineers
Design scalable dbt projects, implement testing frameworks, govern data contracts.
Staff / Lead Engineers
Define the semantic layer, standardize models across the org, lead dbt adoption.
dbt vs other tools
dbt vs Apache Spark
dbt
- SQL-based, warehouse-native
- Analytics transformation
- Free (Core)
- Runs inside your warehouse
Apache Spark
- Distributed compute
- Python / Scala
- Large-scale processing
- Complex infrastructure
Verdict: Use dbt for analytics transformation inside your warehouse. Use Spark for large-scale data processing that needs distributed compute.
dbt vs Apache Airflow
dbt
- Transformation logic
- What the data becomes
- SQL-first models
- Data quality testing
Apache Airflow
- Orchestration
- When and how pipelines run
- DAG scheduling
- Task dependency management
Verdict: They are complementary. Airflow triggers dbt jobs. dbt does the transformation. Most production stacks use both.
dbt vs raw SQL scripts
Raw SQL scripts
- Manual execution
- Hard to test
- No lineage
- No version control
- Breaks silently
dbt
- Modular and reusable
- Tested automatically
- Versioned in Git
- Documented with lineage
- CI/CD ready
Verdict: dbt is what happens when you apply software engineering practices to SQL.
dbt Core vs dbt Cloud vs alternatives
| Tool | Type | SQL-first | Testing | Lineage | Cost |
|---|---|---|---|---|---|
| dbt Core | OSS CLI | ✓ | ✓ | ✓ | Free |
| dbt Cloud | Managed SaaS | ✓ | ✓ | ✓ | $50+/mo |
| SQLMesh | OSS + Cloud | ✓ | ✓ | ✓ | Free/Paid |
| Dataform | Google Cloud | ✓ | ✓ | ✓ | GCP pricing |
Frequently asked questions
- Is dbt free?
- dbt Core is completely free and open-source. dbt Cloud offers a free developer tier and paid team/enterprise plans starting at $50/month per seat.
- What databases does dbt support?
- dbt supports all major cloud warehouses including Snowflake, BigQuery, Redshift, Databricks, DuckDB, and PostgreSQL via adapters.
- Do I need to know Python to use dbt?
- No. dbt is SQL-first. You write models in SQL with Jinja templating. Python models are supported in dbt Core 1.3+ but are optional.
- What's the difference between dbt Core and dbt Cloud?
- dbt Core is the open-source CLI you run locally or in CI. dbt Cloud adds a hosted IDE, job scheduler, observability, and team collaboration features.
- Where does dbt fit in the modern data stack?
- dbt sits in the transformation layer — after data ingestion tools like Fivetran or Airbyte load raw data into your warehouse, dbt transforms it into analytics-ready models.
Related concepts
What you'll build with AI-DE
- Build end-to-end dbt pipelines from staging to mart layer
- Design production-ready dimensional data models
- Implement automated testing and data quality gates
- Create semantic layers for analytics and BI tools
- Deploy dbt with GitHub Actions CI/CD
- Ship a production ecommerce data warehouse with 22 dbt models
Learn dbt on ai-de.net
dbt & Analytics Engineering
Hands-on dbt models, tests, and deployment from zero to production.
Skill ToolkitAdvanced Data Modeling
Kimball, Data Vault, and dimensional modeling for modern warehouses.
Skill ToolkitData Observability & Quality
Monitor pipeline health, catch silent failures, and enforce data contracts.