Data Engineering  /  dbt

🔄 dbt — Data Build Tool 23 guides · updated 2026

Analytics engineering with SQL — models, tests, sources, and Jinja macros that turn raw warehouse tables into trustworthy, documented data products.

dbt Project Structure: What Lives Where and Why

Running dbt init my_project creates a directory with a specific layout. Every folder in that layout has a purpose, and understanding them early saves you a lot of confusion later. This page walks through the standard dbt project structure, what each component does, and how real teams organize things at scale.


The Default Layout

my_project/
├── models/
│ ├── staging/
│ │ └── stg_orders.sql
│ ├── intermediate/
│ │ └── int_customer_orders.sql
│ └── marts/
│ └── fct_revenue.sql
├── macros/
│ └── cents_to_dollars.sql
├── seeds/
│ └── country_codes.csv
├── snapshots/
│ └── customers_snapshot.sql
├── analyses/
│ └── ad_hoc_revenue_check.sql
├── tests/
│ └── assert_positive_revenue.sql
├── dbt_project.yml
└── profiles.yml (usually in ~/.dbt/, not the project)

Each of these serves a different role in the pipeline. Let us go through them one by one.


models/

The models/ directory is where all your transformation logic lives. Every .sql file here becomes a node in the DAG — either a view, table, incremental table, or ephemeral CTE depending on configuration.

The most common way to organize models is by layer:

models/
├── staging/ ← one model per raw source table, minimal transformation
├── intermediate/ ← joins, enrichment, business logic
└── marts/ ← aggregated, BI-ready outputs

Each layer builds on the one above it. Staging models read from source() declarations. Intermediate and mart models read from other models using ref().

A simple staging model looks like this:

-- models/staging/stg_orders.sql
select
id as order_id,
customer_id,
cast(created_at as date) as order_date,
amount_cents / 100.0 as amount_usd,
lower(status) as status
from {{ source('raw', 'orders') }}

And a schema file in the same folder documents and tests it:

models/staging/schema.yml
version: 2
models:
- name: stg_orders
description: "Cleaned orders from the raw transactional database"
columns:
- name: order_id
tests:
- not_null
- unique
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'returned', 'cancelled']

Tests defined here run with dbt test. Keeping the schema file next to the models it describes — rather than in one central file — makes it much easier to maintain as projects grow.


macros/

Macros are Jinja functions you write once and reuse across models. They live in the macros/ directory and are available to all models in the project automatically.

A common use case is wrapping a repeated calculation:

-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
{{ column_name }} / 100.0
{% endmacro %}

Then in any model:

select
order_id,
{{ cents_to_dollars('amount_cents') }} as amount_usd
from {{ ref('stg_orders') }}

Macros also handle more complex patterns — dynamic SQL generation, environment-specific logic, or generating repetitive clauses across multiple columns. The dbt-utils package (which you install via packages.yml) provides a large library of ready-made macros that most teams use to avoid reinventing common patterns.


seeds/

Seeds are CSV files that dbt loads into your warehouse as tables. They are ideal for small, static reference data that changes infrequently — country codes, product categories, marketing channel mappings, exchange rates.

seeds/
└── country_codes.csv

Load them with:

Terminal window
dbt seed

dbt creates a table in your warehouse matching the CSV filename. You can reference it in models just like any other dbt object:

select * from {{ ref('country_codes') }}

One important note: seeds are not designed for large datasets. If a CSV file has more than a few thousand rows, loading it via seed will be slow. For larger lookup tables, load them to your warehouse separately and declare them as sources instead.


snapshots/

Snapshots capture the historical state of a table over time — a feature sometimes called slowly changing dimensions (SCDs). If you need to know what a customer record looked like six months ago, snapshots handle that.

-- snapshots/customers_snapshot.sql
{% snapshot customers_snapshot %}
{{ config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
) }}
select * from {{ source('raw', 'customers') }}
{% endsnapshot %}

Run with dbt snapshot. Each execution checks for changes and appends new rows with dbt_valid_from and dbt_valid_to timestamps to track when each version was active.

Two snapshot strategies exist:


analyses/

The analyses/ folder is for SQL files you want version-controlled but not materialized as warehouse objects. Useful for:

-- analyses/q4_revenue_check.sql
select
date_trunc('month', order_date) as month,
sum(amount_usd) as revenue
from {{ ref('fct_revenue') }}
where order_date >= '2025-10-01'
group by 1
order by 1

Run dbt compile to resolve the Jinja and generate runnable SQL in target/compiled/. You can then copy that SQL and run it directly in your warehouse client.


tests/

While most tests live in schema YAML files, the tests/ directory holds singular tests — custom SQL queries that return rows when something is wrong.

-- tests/assert_revenue_positive.sql
select
order_id,
amount_usd
from {{ ref('fct_revenue') }}
where amount_usd < 0

If this query returns any rows, the test fails. Singular tests are useful for business logic rules that are hard to express with the built-in test types (not_null, unique, accepted_values, relationships).


dbt_project.yml

This is the configuration file that makes a folder a dbt project. It sits at the root and controls global behavior.

name: my_project
version: '1.0.0'
profile: my_profile
model-paths: ["models"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
models:
my_project:
staging:
+materialized: view
+tags: ["staging"]
intermediate:
+materialized: table
marts:
+materialized: table
+tags: ["mart"]

The models: section applies configuration by folder path. The + prefix means “apply this to all models in this path.” This avoids having to set materialization in every individual model file.


How the Pieces Connect at Runtime

When you run dbt run, here is what happens:

dbt_project.yml → sets configuration and paths
|
v
models/ scanned → all .sql files discovered
|
v
ref() and source() → dependency graph built
|
v
topological sort → determines execution order
|
v
Jinja compiled → SQL resolved to final form
|
v
warehouse execution → tables and views created
|
v
target/run/ → compiled SQL saved for inspection

The target/ directory is generated automatically and should not be committed to version control. Add it to .gitignore.


Multi-Domain Organization

For larger teams, the standard single-folder structure can get unwieldy. A common pattern in 2025 is organizing by domain within the layers:

models/
├── staging/
│ ├── finance/
│ │ └── stg_invoices.sql
│ └── marketing/
│ └── stg_campaigns.sql
├── marts/
│ ├── finance/
│ │ └── fct_revenue.sql
│ └── marketing/
│ └── fct_campaign_performance.sql

You can apply dbt_project.yml configuration at any subfolder level:

models:
my_project:
marts:
finance:
+tags: ["finance", "mart"]
marketing:
+tags: ["marketing", "mart"]

With dbt Mesh (stable since 2024), you can take this further and split large projects into separate dbt projects that share models through cross-project references. Each domain team owns their project while exposing specific public models to others.


Naming Conventions That Most Teams Follow

PrefixLayerExample
stg_Stagingstg_orders, stg_customers
int_Intermediateint_customer_orders
fct_Fact table (mart)fct_revenue, fct_sessions
dim_Dimension table (mart)dim_customers, dim_products
mrt_General martmrt_weekly_summary

These prefixes are convention, not enforced by dbt. But they make the DAG much easier to read at a glance, and most teams adopt them because the clarity is worth the extra characters.


What to Put in .gitignore

target/
dbt_packages/
logs/
.env
profiles.yml

The target/ and dbt_packages/ directories are generated at runtime and should never be committed. The profiles.yml file contains credentials and should never be in source control — use environment variables or dbt Cloud’s managed credentials instead.


Quick Reference

LocationPurpose
models/SQL transformation files
models/*/schema.ymlDocumentation, column descriptions, tests
macros/Reusable Jinja/SQL functions
seeds/Static CSV reference data
snapshots/Historical change tracking (SCD)
analyses/Version-controlled ad-hoc queries
tests/Custom singular data quality tests
dbt_project.ymlProject configuration and folder-level settings
packages.ymlExternal dbt package dependencies
target/Generated output — do not commit

Getting comfortable with this layout is the first step to working efficiently in dbt. Once you know where things belong, both writing models and debugging broken runs becomes significantly faster.