Data Engineering  /  dbt

πŸ”„ dbt β€” Data Build Tool 23 guides Β· updated 2026

Analytics engineering with SQL β€” models, tests, sources, and Jinja macros that turn raw warehouse tables into trustworthy, documented data products.

The dbt DAG: Dependency Graphs and Execution Order

When you run dbt run, dbt does not just fire queries at your warehouse in random order. It builds a dependency graph β€” a directed acyclic graph (DAG) β€” from your models and their ref() calls, then executes everything in the order that graph dictates. Understanding the DAG is what separates someone who runs dbt from someone who actually understands what is happening.


What Makes Something a DAG

A DAG has three properties that matter here:

Directed β€” each edge has a direction. stg_orders pointing to fct_revenue means stg_orders is a dependency of fct_revenue, not the other way around.

Acyclic β€” there are no loops. You cannot have model A depend on model B while model B depends on model A. If you try, dbt throws a compilation error before a single query runs.

Graph β€” it is a network of nodes (models, sources, tests, snapshots, seeds) connected by edges (ref() and source() calls).

source: raw_orders
|
v
stg_orders
/ \
v v
stg_customers stg_products
\ /
v v
fct_order_items
|
v
mrt_revenue_summary

dbt reads this structure at compile time and determines that source: raw_orders must be ready before stg_orders runs, and stg_orders must complete before fct_order_items starts.


How the DAG Gets Built

Every ref() call in a model creates a directed edge in the graph. Nothing else is required β€” dbt handles the graph construction automatically.

During the parse phase, dbt reads every .sql file in your models/ directory, extracts all ref() and source() calls, and builds an in-memory graph. The algorithm that determines execution order from that graph is called a topological sort β€” it resolves the full dependency chain and returns an ordered list of nodes where every parent appears before its children.

You can inspect the graph at any time:

Terminal window
dbt docs generate
dbt docs serve

Open localhost:8080, navigate to any model, and click the lineage icon. You will see the model’s full upstream and downstream chain rendered as an interactive graph.


A Working Example

Here is a three-layer DAG with actual SQL:

Layer 1: Staging

-- models/staging/stg_orders.sql
select
id as order_id,
customer_id,
order_date,
amount_cents / 100.0 as amount_usd
from {{ source('raw', 'orders') }}
-- models/staging/stg_customers.sql
select
id as customer_id,
name as customer_name,
region
from {{ source('raw', 'customers') }}

Layer 2: Core

-- models/core/fct_orders.sql
select
o.order_id,
o.order_date,
o.amount_usd,
c.customer_name,
c.region
from {{ ref('stg_orders') }} as o
left join {{ ref('stg_customers') }} as c
on o.customer_id = c.customer_id

Layer 3: Mart

-- models/marts/mrt_monthly_revenue.sql
select
date_trunc('month', order_date) as month,
region,
sum(amount_usd) as revenue
from {{ ref('fct_orders') }}
group by 1, 2

The DAG for this project:

raw.orders ──────────────→ stg_orders ─────────────────────→ fct_orders β†’ mrt_monthly_revenue
raw.customers ────────────→ stg_customers β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

dbt will run stg_orders and stg_customers in parallel (they have no dependencies on each other), then run fct_orders after both complete, then run mrt_monthly_revenue last.


Node Selection: Running Parts of the DAG

One of the most practical applications of understanding the DAG is selective execution. Running your entire project for every change is often unnecessary and wasteful.

Terminal window
# Run a single model only
dbt run --select stg_orders
# Run a model plus all its downstream dependents
dbt run --select stg_orders+
# Run a model plus all its upstream parents
dbt run --select +fct_orders
# Run a model plus both upstream and downstream
dbt run --select +fct_orders+
# Run everything in the staging folder
dbt run --select staging
# Run by tag
dbt run --select tag:mart

The + notation is shorthand for β€œtraverse the DAG in this direction.” stg_orders+ means β€œstg_orders and everything that depends on it.” +fct_orders means β€œfct_orders and everything it depends on.”

You can also exclude nodes:

Terminal window
# Run everything except specific models
dbt run --exclude mrt_monthly_revenue
# Run staging but exclude one model
dbt run --select staging --exclude stg_legacy_table

Cycles and Why They Fail

If model A references model B and model B references model A, dbt detects the loop during the parse phase and refuses to compile:

Error: Found a cycle: model.my_project.model_a --> model.my_project.model_b --> model.my_project.model_a

When this happens, the fix is to extract the shared logic into a third model that both can reference without either depending on the other:

model_a ──→ shared_base_model
model_b ──→ shared_base_model

Cycles usually appear when two models are trying to do things that conceptually belong in the same layer but depend on each other. Rethinking the layer boundaries is often the right solution.


DAG in dbt Cloud and dbt Explorer

In dbt Cloud, the DAG is available in the IDE while you are writing models. You can see the lineage graph update as you edit ref() calls without needing to run docs generate.

dbt Explorer (introduced in 2023 and substantially improved through 2025) gives a richer view:

Column-level lineage is particularly useful for tracing where a calculated field originates. If a revenue metric looks wrong, you can trace the specific column back through intermediate models to its raw source.


Multi-Project DAGs with dbt Mesh

In 2025, many teams running large dbt deployments have adopted dbt Mesh, which allows separate dbt projects to share nodes across project boundaries. A cross-project ref() looks like:

select * from {{ ref('finance_project', 'fct_revenue') }}

This creates a cross-project DAG edge. The upstream project must designate the model as public:

# In the finance_project dbt_project.yml or model config
models:
finance_project:
marts:
+access: public

In dbt Explorer, cross-project lineage is visible as a unified graph across all connected projects, giving platform teams visibility into the full data flow even when multiple teams own separate dbt projects.


When a DAG Node Fails

If a model fails during dbt run, dbt skips all downstream models. The run result shows the failed model with an error message and marks all dependents as β€œskipped.”

1 of 4 START sql view model staging.stg_orders ..................... [RUN]
1 of 4 OK created sql view model staging.stg_orders ............... [OK in 1.2s]
2 of 4 START sql view model staging.stg_customers .................. [RUN]
2 of 4 ERROR creating sql view model staging.stg_customers ......... [ERROR in 0.8s]
3 of 4 SKIP relation core.fct_orders .............................. [SKIP]
4 of 4 SKIP relation marts.mrt_monthly_revenue .................... [SKIP]

This failure propagation is automatic β€” you never have to manually figure out which downstream models are affected. The DAG handles that entirely.


CommandWhat It Does
dbt lsLists all nodes in the project and their types
dbt ls --select +fct_ordersLists fct_orders and all its upstream nodes
dbt docs generateGenerates documentation including the full DAG
dbt docs serveOpens the interactive lineage graph in a browser
dbt compileCompiles the graph and renders SQL without running
dbt run --select model+Runs model and all downstream dependents

The Practical Value of the DAG

Teams that deeply understand their DAG are better at several things:

Impact analysis β€” before changing a staging model, you can see at a glance which core and mart models will be affected. This prevents surprises when a schema change breaks a dashboard.

Cost control β€” running +changed_model in CI/CD rather than the full project means you only run what needs to run, which matters when you have hundreds of models and a warehouse billing by query.

Debugging β€” when something in a mart model looks wrong, the lineage view lets you trace the error back through intermediate models to identify exactly where the bad data entered.

Onboarding β€” a new team member can understand the entire data pipeline by looking at the DAG rather than reading hundreds of SQL files in sequence.

The DAG is not just a visualization feature. It is the core structure that makes dbt’s execution reliable, reproducible, and efficient. Every time you write ref(), you are contributing to that structure.