dbt Fundamentals
dbt Core setup, project structure, sources + refs, your first model, the schema.yml contract, and the basic schema tests that turn ad-hoc SQL into version-controlled analytics.
Build modular data models, Jinja macros, testing, and CI/CD deployment.
dbt is how mature analytics orgs ship SQL: modular models, automated tests, version control, lineage, and CI/CD. JetBlue, GitLab, Hubspot, Convoy, and thousands of other companies treat it as the standard transformation tool. Senior analytics engineers are expected to defend their model layering, incremental strategy, test design, and semantic-layer / dbt-mesh choices.
Fundamentals, layered modeling, and testing — go from a single SELECT to a tested staging/intermediate/mart project in one phase.
dbt Core setup, project structure, sources + refs, your first model, the schema.yml contract, and the basic schema tests that turn ad-hoc SQL into version-controlled analytics.
The staging / intermediate / mart layering pattern, dimensional modeling (Kimball facts + dims), star schemas in dbt, surrogate keys with dbt-utils, and the directory conventions that scale to 200+ models.
Schema vs data tests, the four built-in generic tests, custom singular tests, the dbt-expectations package, and a testing philosophy that catches issues before stakeholders open the dashboard.
Jinja and incremental models — the code-reuse + performance layer that turns a working dbt project into a maintainable one your team can grow into.
Jinja fundamentals, writing custom macros, the dbt-utils + dbt-expectations + audit-helper packages, advanced templating patterns, and macro testing — the layer that turns 200 ad-hoc models into a maintainable library.
Incremental models deep dive, merge / append / delete+insert strategies, watermarks, partitioning + clustering decisions, query optimization, and the cost-reduction patterns that decide whether your dbt run is 6 minutes or 60.
CI/CD deployment plus the semantic + governance layer — Slim CI, dbt Cloud vs Core, MetricFlow, exposures, dbt Mesh, and the multi-project governance patterns mature analytics orgs run on.
CI/CD with GitHub Actions, Slim CI (state:modified+), environment management (dev / staging / prod), dbt Cloud vs Core tradeoffs, Airflow orchestration integration, and the production checklist for a launch review.
The dbt Semantic Layer + MetricFlow, exposures + downstream lineage, advanced observability, dbt Mesh + cross-project refs, governance patterns, and a capstone that ships everything end-to-end.
Without production-grade dbt, you risk:
dbt (data build tool) is an open-source transformation framework that lets data teams write modular SQL models with built-in testing, documentation, and version control. dbt has become the industry standard for analytics engineering, used by thousands of companies including JetBlue, Hubspot, and GitLab to transform raw data into reliable analytics.
Production data teams use dbt to manage hundreds of SQL models with proper testing and CI/CD deployment. At companies like Gitlab, dbt runs thousands of models daily with automated quality checks. Without dbt, SQL transformations become unmaintainable spaghetti that breaks silently.
dbt provides version control, testing, and documentation that stored procedures lack. dbt models are SQL SELECT statements managed like software, while stored procedures are database-specific and hard to test.
dbt has a larger community, more packages, and broader warehouse support. Dataform is Google-owned and tightly integrated with BigQuery. Most teams outside the Google ecosystem choose dbt.
dbt handles SQL transformations with built-in testing and lineage. Custom Python ETL is needed for non-SQL logic, API calls, and orchestration. Most teams use dbt for transformations and Python for everything else.
dbt is the most-requested analytics-engineering skill in DE / AE job listings. Senior + Staff analytics-engineering roles at data-mature orgs (JetBlue, GitLab, Hubspot, Convoy, Reddit) hire specifically for engineers who can defend incremental strategy, test design, Jinja-macro architecture, and semantic-layer / dbt-mesh decisions — the exact tradeoffs this path makes you defensible on.
dbt transforms raw data into analytics-ready models inside your data warehouse. It manages SQL transformations with testing, documentation, and version control — the standard workflow for analytics engineering.
dbt is the dominant transformation tool in modern data stacks. With dbt Mesh, semantic layers, and growing LLM integrations, its relevance continues to increase.
Basic dbt takes 1-2 weeks if you know SQL. Production-level dbt with Jinja macros, custom tests, and CI/CD deployment typically takes 4-6 weeks of practice.
Yes. dbt is expected knowledge for both data engineers and analytics engineers. It appears in most job descriptions and is the standard way teams manage SQL transformations.
dbt Core is the free open-source CLI. dbt Cloud adds a web IDE, job scheduling, and managed infrastructure. Most teams start with Core and move to Cloud as they scale.
dbt handles the Transform step but not Extract or Load. You still need ingestion tools like Fivetran and orchestrators like Airflow. dbt is one critical piece of the data stack.
Stay single-project while your org has a small number of analytics-engineering owners and your model graph is reviewable by one team. Adopt dbt Mesh once multiple teams own disjoint subsets of the warehouse, model PRs start blocking each other, or you need to expose certified data products across team boundaries with versioned contracts. The trigger is organizational scale (multiple owning teams) more than model count.