dbt Models: The Building Blocks of Data Transformation
A dbt model is a SQL file. That is the whole thing at its simplest. You write a SELECT statement, save it as a .sql file inside your models/ directory, and dbt handles the rest β compiling the SQL, creating the right object in your warehouse, and tracking where it fits in the broader pipeline.
What makes this powerful is everything built around that simple file: dependency resolution, configurable materializations, Jinja templating, layered architecture conventions, and automatic documentation. This page covers all of it.
What Happens When dbt Runs a Model
The lifecycle of a model looks like this:
.sql file in models/ | vdbt compile (resolves Jinja, ref(), macros) | vcompiled SQL stored in target/compiled/ | vdbt run (sends SQL to warehouse) | vtable or view created in your schemaDuring dbt run, every model in your project goes through this process. The compiled SQL β the actual query sent to the warehouse β is saved in the target/ directory so you can inspect it if something looks off.
Materializations
Materialization controls what kind of object dbt creates in the warehouse. There are four options:
view β dbt creates a SQL view. No data is stored; the query runs each time someone queries the view. Good for lightweight staging models.
table β dbt drops and recreates the full table on every run. Simple and predictable, but expensive for large datasets.
incremental β dbt only processes new or changed rows and merges them into an existing table. This is the most common choice for fact tables with millions of rows.
ephemeral β the model is never materialized directly. Instead, its SQL is inlined as a CTE into any model that references it. Useful for intermediate logic you want to reuse without creating a warehouse object.
You set materialization inline in the model file:
{{ config(materialized='incremental', unique_key='order_id') }}Or in dbt_project.yml to apply a default across a folder:
models: my_project: staging: +materialized: view marts: +materialized: tableA Three-Layer Architecture
Most dbt projects follow a staging β core β mart pattern. Each layer has a clear responsibility.
Raw warehouse tables (source data) | v Staging layer (stg_*) Clean, rename, cast β one model per source table | v Core/intermediate layer (int_*, fct_*, dim_*) Join, enrich, apply business logic | v Mart layer (mrt_*) Aggregated, business-ready β what BI tools connect toThis separation makes it easy to find where logic lives, test each layer independently, and swap out a source table without touching downstream models.
Staging Models
A staging model reads from a source table and produces a clean, consistent output. It does not join or aggregate β that comes later.
-- models/staging/stg_orders.sql{{ config(materialized='view') }}
select id as order_id, customer_id, cast(created_at as date) as order_date, lower(status) as status, amount_cents / 100.0 as amount_usdfrom {{ source('raw', 'orders') }}where status != 'test'Using source() here instead of a hardcoded table name gives you freshness checks and keeps the source name in one place. If the raw table is renamed, you only change the source definition, not every model that reads from it.
Core Models β Where Business Logic Lives
Once staging models exist, core models join and transform them.
-- models/core/fct_customer_orders.sql{{ config(materialized='table') }}
with orders as ( select * from {{ ref('stg_orders') }} ), customers as ( select * from {{ ref('stg_customers') }} )
select orders.order_id, orders.order_date, orders.amount_usd, customers.customer_name, customers.region, customers.segmentfrom ordersleft join customers using (customer_id)The ref() calls register stg_orders and stg_customers as upstream dependencies. dbt will not run this model until both staging models have completed successfully.
Incremental Models for Large Tables
For tables that grow continuously β events, transactions, logs β rebuilding everything on every run is wasteful. Incremental models solve this.
-- models/core/fct_daily_events.sql{{ config( materialized='incremental', unique_key='event_id') }}
select event_id, user_id, event_type, occurred_atfrom {{ ref('stg_events') }}
{% if is_incremental() %} where occurred_at > (select max(occurred_at) from {{ this }}){% endif %}On the first run, dbt creates the table with all data. On subsequent runs, only rows newer than the current maximum occurred_at are processed and merged. The {{ this }} macro refers to the existing table in the warehouse.
In 2025, dbt added the microbatch strategy as a stable incremental option. It breaks large incremental loads into smaller time-based batches, which makes retries more granular and failures less costly.
Mart Models
Mart models are the final aggregated layer that business users and dashboards connect to. Keep them simple β they should mostly just group and aggregate from the core layer.
-- models/marts/mrt_revenue_by_region.sql{{ config(materialized='table') }}
select region, date_trunc('month', order_date) as month, count(distinct order_id) as order_count, sum(amount_usd) as total_revenuefrom {{ ref('fct_customer_orders') }}group by 1, 2order by 1, 2Environment-Aware Models
One of the more practical dbt features is writing models that behave differently in development versus production. A common pattern limits data volume in dev:
{{ config(materialized='table') }}
select *from {{ ref('stg_events') }}
{% if target.name == 'dev' %} where occurred_at >= current_date - 7{% endif %}In dev, you only pull the last 7 days of data, which keeps queries fast and warehouse costs low. In prod, the WHERE clause is excluded entirely.
Python Models
Since dbt v1.3, you can write models as Python files. These are useful for transformations that SQL handles poorly β complex statistical operations, machine learning feature engineering, or calls to external libraries.
def model(dbt, session): df = dbt.ref('fct_daily_events').to_pandas() df['z_score'] = (df['event_count'] - df['event_count'].mean()) / df['event_count'].std() return dfPython models use the same ref() dependency system as SQL models, so they fit naturally into your existing DAG. They run via the warehouseβs native Python runtime (Snowpark for Snowflake, Databricks for Spark, etc.).
Configuration Options Worth Knowing
| Option | What It Does |
|---|---|
materialized | view, table, incremental, or ephemeral |
unique_key | Column(s) used for upsert in incremental models |
tags | Label models for selective execution (dbt run --select tag:staging) |
pre_hook / post_hook | SQL to run before or after the model |
on_schema_change | What to do when columns are added/removed (fail, ignore, append_new_columns) |
partition_by | Partition configuration for BigQuery or Snowflake |
enabled | Set to false to skip a model without deleting the file |
Common Mistakes
Putting too much logic in one model β if a model is 200+ lines of SQL doing joins, aggregations, and business logic all at once, split it into stages. Each model should do one thing well.
Hardcoding table names β select * from prod_schema.orders will break when you deploy to a different environment. Use ref() and source() instead.
Using table materialization everywhere β tables rebuild from scratch on every run. For large datasets this is slow and expensive. Use incremental or view where appropriate.
Not checking compiled SQL β when you use Jinja conditionals or macros, the generated SQL may not be what you expect. Run dbt compile and check the output in target/compiled/ before running.
Running Specific Models
You do not have to run every model on every execution. dbtβs node selection syntax lets you target specific models:
# Run a single modeldbt run --select stg_orders
# Run a model and everything downstreamdbt run --select stg_orders+
# Run everything in the staging folderdbt run --select staging
# Run models with a specific tagdbt run --select tag:martThe + selector means βand all descendants.β The prefix + means βand all ancestors.β You can combine them.
Summary
dbt models are SQL files that define transformations. Each model becomes a view, table, incremental table, or ephemeral CTE depending on its configuration. The ref() function connects models into a dependency graph that dbt uses to determine execution order.
The three-layer pattern β staging, core, mart β is the most widely adopted structure in production dbt projects. It keeps logic organized, makes testing easier, and allows teams to work on different layers independently without stepping on each other.
Understanding materializations, especially incremental models, is where most of the performance and cost work happens in mature dbt projects.