Data Engineering  /  dbt

🔄 dbt — Data Build Tool 23 guides · updated 2026

Analytics engineering with SQL — models, tests, sources, and Jinja macros that turn raw warehouse tables into trustworthy, documented data products.

dbt Model Dependencies and the ref() Function

When you are working with dbt at any meaningful scale, you quickly realize that the order your models run in matters a lot. A summary table that runs before its source staging model is going to fail every time. dbt solves this problem elegantly through the ref() function, which does two things at once: it tells dbt which model to read data from, and it registers a dependency that shapes the entire execution order.

This page walks through how that works in practice, with examples that reflect how teams actually structure their projects in 2025.


The Core Idea

Every SQL model in dbt lives in a .sql file. When one model needs data from another, you write:

select * from {{ ref('stg_orders') }}

That single line does something important. dbt reads it during the compile phase, notes that the current model depends on stg_orders, and adds an edge in the DAG between the two. When you run dbt run, the framework figures out the correct sequence automatically — you never have to think about it manually.

This is fundamentally different from writing raw SQL against a warehouse where you have to know which table to reference and hope it is already populated.


How the DAG Gets Built

DAG stands for Directed Acyclic Graph. “Directed” means dependencies flow in one direction. “Acyclic” means no circular loops — model A cannot depend on model B if B depends on A.

When you call dbt run, the sequence looks like this:

1. dbt reads every .sql file in the models/ directory
2. It scans for ref() and source() calls
3. It builds a graph of nodes (models) and edges (dependencies)
4. It runs models in topological order — parents before children

A simple three-layer project looks like this in the graph:

raw_orders (source)
|
v
stg_orders (staging)
|
v
customer_orders (core)
|
v
sales_summary (mart)

Each arrow represents a ref() call in the downstream model. dbt guarantees that stg_orders finishes before customer_orders starts, and so on down the chain.


Practical Examples

Staging Layer — Cleaning Raw Data

The staging layer is where raw source tables get cleaned up. You reference source tables with source(), not ref(), because they live outside your dbt project.

-- models/staging/stg_orders.sql
select
id as order_id,
customer_id,
cast(created_at as date) as order_date,
amount,
status
from {{ source('raw', 'orders') }}
where status != 'test'

Core Layer — Joining Two Staging Models

Once staging models exist, the core layer can reference them using ref().

-- models/core/customer_orders.sql
select
c.customer_id,
c.customer_name,
count(o.order_id) as total_orders,
sum(o.amount) as total_spent,
max(o.order_date) as last_order_date
from {{ ref('stg_customers') }} as c
left join {{ ref('stg_orders') }} as o
on c.customer_id = o.customer_id
group by 1, 2

dbt sees two ref() calls here. Both stg_customers and stg_orders become upstream dependencies. Either one failing will stop this model from running.

Mart Layer — Business-Ready Aggregations

The mart layer is what analysts and BI tools typically connect to.

-- models/marts/sales_summary.sql
select
region,
date_trunc('month', order_date) as month,
sum(total_spent) as revenue,
count(distinct customer_id) as unique_customers
from {{ ref('customer_orders') }}
group by 1, 2

This creates a clean three-level chain: staging → core → mart.


Using ref() Inside CTEs

Many teams prefer CTEs over subqueries for readability. ref() works naturally inside them.

with
customers as (
select * from {{ ref('stg_customers') }}
),
orders as (
select * from {{ ref('stg_orders') }}
)
select
customers.customer_id,
customers.email,
orders.amount
from customers
join orders using (customer_id)

This keeps the SQL flat and easy to read, while still giving dbt the dependency information it needs.


ref() in Incremental Models

One of the most common patterns in 2025 dbt projects is combining ref() with incremental materializations to handle large tables efficiently.

-- models/core/daily_revenue.sql
{{ config(materialized='incremental', unique_key='order_date') }}
select
order_date,
sum(amount) as daily_revenue
from {{ ref('stg_orders') }}
{% if is_incremental() %}
where order_date > (select max(order_date) from {{ this }})
{% endif %}
group by 1

dbt only processes new rows on subsequent runs. The {{ this }} macro refers to the current model’s existing table in the warehouse, and ref('stg_orders') still registers the upstream dependency correctly.


Cross-Project References (dbt Mesh)

Since dbt Mesh became production-ready in 2024 and matured further in 2025, teams can now reference models across separate dbt projects. The syntax uses a two-part identifier:

select * from {{ ref('finance_project', 'monthly_revenue') }}

This pulls data from a model in the finance_project dbt project. The upstream project must expose the model as a public node in its dbt_project.yml. Cross-project refs enforce the same dependency tracking as regular refs, just across project boundaries.


Visualizing Dependencies

After writing your models, you can generate documentation that includes an interactive DAG:

Terminal window
dbt docs generate
dbt docs serve

Open localhost:8080 in your browser and click the lineage graph icon on any model. You will see its full upstream and downstream chain. This is one of the most practical debugging tools dbt offers — when a model breaks, the lineage view tells you immediately what else will be affected.


Common Mistakes

Hardcoding table names instead of using ref()

-- Wrong
select * from my_schema.stg_orders
-- Right
select * from {{ ref('stg_orders') }}

Hardcoded names break when schemas change across environments. ref() resolves the correct schema automatically based on your target.

Creating circular dependencies

If model A references B and B references A, dbt will throw an error: Found a cycle. This usually means you need to restructure — often by extracting shared logic into a third model that both can reference.

Inconsistent naming

If a model file is named staging_orders.sql but you reference it as ref('stg_orders'), the run will fail with a model-not-found error. The string inside ref() must match the filename exactly (without the .sql extension).


Dependency Execution Summary

ConceptWhat It Does
ref('model_name')Declares a dependency and resolves the correct relation in the warehouse
DAGThe directed graph dbt builds from all ref() calls to determine run order
Topological sortThe algorithm dbt uses to execute models parents-first
dbt docs serveShows the interactive lineage graph for your entire project
Cross-project refReferences a public model from another dbt project (dbt Mesh)

The ref() function is the single most important concept in dbt. Everything from execution order to environment portability to lineage documentation flows through it. Getting comfortable with it early makes every other dbt feature easier to understand.