Data Engineering  /  dbt

πŸ”„ dbt β€” Data Build Tool 23 guides Β· updated 2026

Analytics engineering with SQL β€” models, tests, sources, and Jinja macros that turn raw warehouse tables into trustworthy, documented data products.

dbt Models: The Building Blocks of Data Transformation

A dbt model is a SQL file. That is the whole thing at its simplest. You write a SELECT statement, save it as a .sql file inside your models/ directory, and dbt handles the rest β€” compiling the SQL, creating the right object in your warehouse, and tracking where it fits in the broader pipeline.

What makes this powerful is everything built around that simple file: dependency resolution, configurable materializations, Jinja templating, layered architecture conventions, and automatic documentation. This page covers all of it.


What Happens When dbt Runs a Model

The lifecycle of a model looks like this:

.sql file in models/
|
v
dbt compile (resolves Jinja, ref(), macros)
|
v
compiled SQL stored in target/compiled/
|
v
dbt run (sends SQL to warehouse)
|
v
table or view created in your schema

During dbt run, every model in your project goes through this process. The compiled SQL β€” the actual query sent to the warehouse β€” is saved in the target/ directory so you can inspect it if something looks off.


Materializations

Materialization controls what kind of object dbt creates in the warehouse. There are four options:

view β€” dbt creates a SQL view. No data is stored; the query runs each time someone queries the view. Good for lightweight staging models.

table β€” dbt drops and recreates the full table on every run. Simple and predictable, but expensive for large datasets.

incremental β€” dbt only processes new or changed rows and merges them into an existing table. This is the most common choice for fact tables with millions of rows.

ephemeral β€” the model is never materialized directly. Instead, its SQL is inlined as a CTE into any model that references it. Useful for intermediate logic you want to reuse without creating a warehouse object.

You set materialization inline in the model file:

{{ config(materialized='incremental', unique_key='order_id') }}

Or in dbt_project.yml to apply a default across a folder:

models:
my_project:
staging:
+materialized: view
marts:
+materialized: table

A Three-Layer Architecture

Most dbt projects follow a staging β†’ core β†’ mart pattern. Each layer has a clear responsibility.

Raw warehouse tables (source data)
|
v
Staging layer (stg_*)
Clean, rename, cast β€” one model per source table
|
v
Core/intermediate layer (int_*, fct_*, dim_*)
Join, enrich, apply business logic
|
v
Mart layer (mrt_*)
Aggregated, business-ready β€” what BI tools connect to

This separation makes it easy to find where logic lives, test each layer independently, and swap out a source table without touching downstream models.


Staging Models

A staging model reads from a source table and produces a clean, consistent output. It does not join or aggregate β€” that comes later.

-- models/staging/stg_orders.sql
{{ config(materialized='view') }}
select
id as order_id,
customer_id,
cast(created_at as date) as order_date,
lower(status) as status,
amount_cents / 100.0 as amount_usd
from {{ source('raw', 'orders') }}
where status != 'test'

Using source() here instead of a hardcoded table name gives you freshness checks and keeps the source name in one place. If the raw table is renamed, you only change the source definition, not every model that reads from it.


Core Models β€” Where Business Logic Lives

Once staging models exist, core models join and transform them.

-- models/core/fct_customer_orders.sql
{{ config(materialized='table') }}
with
orders as (
select * from {{ ref('stg_orders') }}
),
customers as (
select * from {{ ref('stg_customers') }}
)
select
orders.order_id,
orders.order_date,
orders.amount_usd,
customers.customer_name,
customers.region,
customers.segment
from orders
left join customers using (customer_id)

The ref() calls register stg_orders and stg_customers as upstream dependencies. dbt will not run this model until both staging models have completed successfully.


Incremental Models for Large Tables

For tables that grow continuously β€” events, transactions, logs β€” rebuilding everything on every run is wasteful. Incremental models solve this.

-- models/core/fct_daily_events.sql
{{ config(
materialized='incremental',
unique_key='event_id'
) }}
select
event_id,
user_id,
event_type,
occurred_at
from {{ ref('stg_events') }}
{% if is_incremental() %}
where occurred_at > (select max(occurred_at) from {{ this }})
{% endif %}

On the first run, dbt creates the table with all data. On subsequent runs, only rows newer than the current maximum occurred_at are processed and merged. The {{ this }} macro refers to the existing table in the warehouse.

In 2025, dbt added the microbatch strategy as a stable incremental option. It breaks large incremental loads into smaller time-based batches, which makes retries more granular and failures less costly.


Mart Models

Mart models are the final aggregated layer that business users and dashboards connect to. Keep them simple β€” they should mostly just group and aggregate from the core layer.

-- models/marts/mrt_revenue_by_region.sql
{{ config(materialized='table') }}
select
region,
date_trunc('month', order_date) as month,
count(distinct order_id) as order_count,
sum(amount_usd) as total_revenue
from {{ ref('fct_customer_orders') }}
group by 1, 2
order by 1, 2

Environment-Aware Models

One of the more practical dbt features is writing models that behave differently in development versus production. A common pattern limits data volume in dev:

{{ config(materialized='table') }}
select *
from {{ ref('stg_events') }}
{% if target.name == 'dev' %}
where occurred_at >= current_date - 7
{% endif %}

In dev, you only pull the last 7 days of data, which keeps queries fast and warehouse costs low. In prod, the WHERE clause is excluded entirely.


Python Models

Since dbt v1.3, you can write models as Python files. These are useful for transformations that SQL handles poorly β€” complex statistical operations, machine learning feature engineering, or calls to external libraries.

models/core/anomaly_scores.py
def model(dbt, session):
df = dbt.ref('fct_daily_events').to_pandas()
df['z_score'] = (df['event_count'] - df['event_count'].mean()) / df['event_count'].std()
return df

Python models use the same ref() dependency system as SQL models, so they fit naturally into your existing DAG. They run via the warehouse’s native Python runtime (Snowpark for Snowflake, Databricks for Spark, etc.).


Configuration Options Worth Knowing

OptionWhat It Does
materializedview, table, incremental, or ephemeral
unique_keyColumn(s) used for upsert in incremental models
tagsLabel models for selective execution (dbt run --select tag:staging)
pre_hook / post_hookSQL to run before or after the model
on_schema_changeWhat to do when columns are added/removed (fail, ignore, append_new_columns)
partition_byPartition configuration for BigQuery or Snowflake
enabledSet to false to skip a model without deleting the file

Common Mistakes

Putting too much logic in one model β€” if a model is 200+ lines of SQL doing joins, aggregations, and business logic all at once, split it into stages. Each model should do one thing well.

Hardcoding table names β€” select * from prod_schema.orders will break when you deploy to a different environment. Use ref() and source() instead.

Using table materialization everywhere β€” tables rebuild from scratch on every run. For large datasets this is slow and expensive. Use incremental or view where appropriate.

Not checking compiled SQL β€” when you use Jinja conditionals or macros, the generated SQL may not be what you expect. Run dbt compile and check the output in target/compiled/ before running.


Running Specific Models

You do not have to run every model on every execution. dbt’s node selection syntax lets you target specific models:

Terminal window
# Run a single model
dbt run --select stg_orders
# Run a model and everything downstream
dbt run --select stg_orders+
# Run everything in the staging folder
dbt run --select staging
# Run models with a specific tag
dbt run --select tag:mart

The + selector means β€œand all descendants.” The prefix + means β€œand all ancestors.” You can combine them.


Summary

dbt models are SQL files that define transformations. Each model becomes a view, table, incremental table, or ephemeral CTE depending on its configuration. The ref() function connects models into a dependency graph that dbt uses to determine execution order.

The three-layer pattern β€” staging, core, mart β€” is the most widely adopted structure in production dbt projects. It keeps logic organized, makes testing easier, and allows teams to work on different layers independently without stepping on each other.

Understanding materializations, especially incremental models, is where most of the performance and cost work happens in mature dbt projects.