The dbt DAG: Dependency Graphs and Execution Order
When you run dbt run, dbt does not just fire queries at your warehouse in random order. It builds a dependency graph β a directed acyclic graph (DAG) β from your models and their ref() calls, then executes everything in the order that graph dictates. Understanding the DAG is what separates someone who runs dbt from someone who actually understands what is happening.
What Makes Something a DAG
A DAG has three properties that matter here:
Directed β each edge has a direction. stg_orders pointing to fct_revenue means stg_orders is a dependency of fct_revenue, not the other way around.
Acyclic β there are no loops. You cannot have model A depend on model B while model B depends on model A. If you try, dbt throws a compilation error before a single query runs.
Graph β it is a network of nodes (models, sources, tests, snapshots, seeds) connected by edges (ref() and source() calls).
source: raw_orders | v stg_orders / \ v v stg_customers stg_products \ / v v fct_order_items | v mrt_revenue_summarydbt reads this structure at compile time and determines that source: raw_orders must be ready before stg_orders runs, and stg_orders must complete before fct_order_items starts.
How the DAG Gets Built
Every ref() call in a model creates a directed edge in the graph. Nothing else is required β dbt handles the graph construction automatically.
During the parse phase, dbt reads every .sql file in your models/ directory, extracts all ref() and source() calls, and builds an in-memory graph. The algorithm that determines execution order from that graph is called a topological sort β it resolves the full dependency chain and returns an ordered list of nodes where every parent appears before its children.
You can inspect the graph at any time:
dbt docs generatedbt docs serveOpen localhost:8080, navigate to any model, and click the lineage icon. You will see the modelβs full upstream and downstream chain rendered as an interactive graph.
A Working Example
Here is a three-layer DAG with actual SQL:
Layer 1: Staging
-- models/staging/stg_orders.sqlselect id as order_id, customer_id, order_date, amount_cents / 100.0 as amount_usdfrom {{ source('raw', 'orders') }}-- models/staging/stg_customers.sqlselect id as customer_id, name as customer_name, regionfrom {{ source('raw', 'customers') }}Layer 2: Core
-- models/core/fct_orders.sqlselect o.order_id, o.order_date, o.amount_usd, c.customer_name, c.regionfrom {{ ref('stg_orders') }} as oleft join {{ ref('stg_customers') }} as c on o.customer_id = c.customer_idLayer 3: Mart
-- models/marts/mrt_monthly_revenue.sqlselect date_trunc('month', order_date) as month, region, sum(amount_usd) as revenuefrom {{ ref('fct_orders') }}group by 1, 2The DAG for this project:
raw.orders βββββββββββββββ stg_orders ββββββββββββββββββββββ fct_orders β mrt_monthly_revenueraw.customers βββββββββββββ stg_customers βββββββββββββββββββdbt will run stg_orders and stg_customers in parallel (they have no dependencies on each other), then run fct_orders after both complete, then run mrt_monthly_revenue last.
Node Selection: Running Parts of the DAG
One of the most practical applications of understanding the DAG is selective execution. Running your entire project for every change is often unnecessary and wasteful.
# Run a single model onlydbt run --select stg_orders
# Run a model plus all its downstream dependentsdbt run --select stg_orders+
# Run a model plus all its upstream parentsdbt run --select +fct_orders
# Run a model plus both upstream and downstreamdbt run --select +fct_orders+
# Run everything in the staging folderdbt run --select staging
# Run by tagdbt run --select tag:martThe + notation is shorthand for βtraverse the DAG in this direction.β stg_orders+ means βstg_orders and everything that depends on it.β +fct_orders means βfct_orders and everything it depends on.β
You can also exclude nodes:
# Run everything except specific modelsdbt run --exclude mrt_monthly_revenue
# Run staging but exclude one modeldbt run --select staging --exclude stg_legacy_tableCycles and Why They Fail
If model A references model B and model B references model A, dbt detects the loop during the parse phase and refuses to compile:
Error: Found a cycle: model.my_project.model_a --> model.my_project.model_b --> model.my_project.model_aWhen this happens, the fix is to extract the shared logic into a third model that both can reference without either depending on the other:
model_a βββ shared_base_modelmodel_b βββ shared_base_modelCycles usually appear when two models are trying to do things that conceptually belong in the same layer but depend on each other. Rethinking the layer boundaries is often the right solution.
DAG in dbt Cloud and dbt Explorer
In dbt Cloud, the DAG is available in the IDE while you are writing models. You can see the lineage graph update as you edit ref() calls without needing to run docs generate.
dbt Explorer (introduced in 2023 and substantially improved through 2025) gives a richer view:
- Full lineage across sources, models, exposures, and metrics
- Column-level lineage showing which specific columns flow through the graph
- Model execution history and run status overlaid on the lineage view
- Usage data showing which downstream consumers (BI tools, notebooks) are hitting which models
Column-level lineage is particularly useful for tracing where a calculated field originates. If a revenue metric looks wrong, you can trace the specific column back through intermediate models to its raw source.
Multi-Project DAGs with dbt Mesh
In 2025, many teams running large dbt deployments have adopted dbt Mesh, which allows separate dbt projects to share nodes across project boundaries. A cross-project ref() looks like:
select * from {{ ref('finance_project', 'fct_revenue') }}This creates a cross-project DAG edge. The upstream project must designate the model as public:
# In the finance_project dbt_project.yml or model configmodels: finance_project: marts: +access: publicIn dbt Explorer, cross-project lineage is visible as a unified graph across all connected projects, giving platform teams visibility into the full data flow even when multiple teams own separate dbt projects.
When a DAG Node Fails
If a model fails during dbt run, dbt skips all downstream models. The run result shows the failed model with an error message and marks all dependents as βskipped.β
1 of 4 START sql view model staging.stg_orders ..................... [RUN] 1 of 4 OK created sql view model staging.stg_orders ............... [OK in 1.2s] 2 of 4 START sql view model staging.stg_customers .................. [RUN] 2 of 4 ERROR creating sql view model staging.stg_customers ......... [ERROR in 0.8s] 3 of 4 SKIP relation core.fct_orders .............................. [SKIP] 4 of 4 SKIP relation marts.mrt_monthly_revenue .................... [SKIP]This failure propagation is automatic β you never have to manually figure out which downstream models are affected. The DAG handles that entirely.
Useful DAG-Related Commands
| Command | What It Does |
|---|---|
dbt ls | Lists all nodes in the project and their types |
dbt ls --select +fct_orders | Lists fct_orders and all its upstream nodes |
dbt docs generate | Generates documentation including the full DAG |
dbt docs serve | Opens the interactive lineage graph in a browser |
dbt compile | Compiles the graph and renders SQL without running |
dbt run --select model+ | Runs model and all downstream dependents |
The Practical Value of the DAG
Teams that deeply understand their DAG are better at several things:
Impact analysis β before changing a staging model, you can see at a glance which core and mart models will be affected. This prevents surprises when a schema change breaks a dashboard.
Cost control β running +changed_model in CI/CD rather than the full project means you only run what needs to run, which matters when you have hundreds of models and a warehouse billing by query.
Debugging β when something in a mart model looks wrong, the lineage view lets you trace the error back through intermediate models to identify exactly where the bad data entered.
Onboarding β a new team member can understand the entire data pipeline by looking at the DAG rather than reading hundreds of SQL files in sequence.
The DAG is not just a visualization feature. It is the core structure that makes dbtβs execution reliable, reproducible, and efficient. Every time you write ref(), you are contributing to that structure.