dbt Seeds: Static Reference Data That Belongs in Your DAG

Most data in a warehouse comes from source systems — applications, APIs, event streams. But some data does not come from anywhere except a spreadsheet someone maintains manually. Country codes. Fiscal calendar mappings. Product category hierarchies. Internal cost rates. This kind of reference data needs to live in your warehouse too, and dbt seeds are the right way to get it there.

What Is a dbt Seed?

A seed is a CSV file stored inside your dbt project that dbt loads into your warehouse as a table. Once loaded, you can reference it in models exactly the same way you reference any other model — using {{ ref() }}.

Seeds are not meant for large datasets. They are for small, slowly-changing reference tables where the authoritative source is a flat file rather than a live system.

dbt project structure with seeds
---------------------------------
my_project/
├── models/
│   ├── staging/
│   └── marts/
├── seeds/                    <-- your CSV files go here
│   ├── country_codes.csv
│   ├── product_categories.csv
│   └── fiscal_calendar.csv
├── snapshots/
└── dbt_project.yml

When to Use Seeds (and When Not To)

Seeds are a good fit when:

The data is small (a few hundred to a few thousand rows)
The data changes infrequently
The source of truth is a spreadsheet or manually maintained file
You want the reference data version-controlled alongside your models

Seeds are not a good fit when:

The data is large (tens of thousands of rows or more — use a proper ingestion tool)
The data updates automatically from a source system
The data contains sensitive information (seeds are committed to git)

Creating a Seed File

Drop a CSV file into the seeds/ directory. dbt infers column types from the data. For a country code lookup table, seeds/country_codes.csv might look like:

country_code,country_name,region,currency_code
US,United States,North America,USD
GB,United Kingdom,Europe,GBP
DE,Germany,Europe,EUR
JP,Japan,Asia Pacific,JPY
AU,Australia,Asia Pacific,AUD
CA,Canada,North America,CAD
SG,Singapore,Asia Pacific,SGD

Load it into the warehouse:

dbt seed

dbt creates a table in your warehouse using the filename as the table name (country_codes), in the schema defined in your project config.

Configuring Seeds in dbt_project.yml

You can control how seeds behave through dbt_project.yml:

seeds:
  my_project:
    +schema: reference          # loads into a 'reference' schema
    +quote_columns: false
    country_codes:
      +column_types:
        country_code: varchar(2)
        currency_code: varchar(3)
    fiscal_calendar:
      +schema: finance_reference

For columns where dbt’s type inference might be unreliable (like codes that look like integers), always specify types explicitly.

Referencing Seeds in Models

Once loaded, a seed is referenced just like any other model:

-- models/marts/fct_orders_with_region.sql

with orders as (
    select * from {{ ref('stg_orders') }}
),

countries as (
    select * from {{ ref('country_codes') }}
),

enriched as (
    select
        o.order_id,
        o.order_date,
        o.customer_id,
        o.order_amount_usd,
        c.country_name,
        c.region,
        c.currency_code
    from orders o
    left join countries c
        on o.ship_to_country_code = c.country_code
)

select * from enriched

dbt knows that fct_orders_with_region depends on both stg_orders and the country_codes seed, so both are included in the lineage graph.

How Seeds Appear in the DAG

The DAG with seeds included:

[source: raw.orders]             [seed: country_codes]
         |                                |
   [stg_orders]                           |
         |                                |
         +------[fct_orders_with_region]--+
                          |
                [regional_revenue_report]

Seeds show up in dbt docs with the same documentation and lineage tracking as any other node in the project.

Adding Tests and Documentation to Seeds

You can document and test seeds with YAML, just like models:

version: 2

seeds:
  - name: country_codes
    description: "ISO 3166-1 alpha-2 country codes with region and currency mapping"
    columns:
      - name: country_code
        description: "Two-letter ISO country code"
        tests:
          - unique
          - not_null
      - name: country_name
        tests:
          - not_null
      - name: region
        tests:
          - not_null
          - accepted_values:
              values:
                - 'North America'
                - 'Europe'
                - 'Asia Pacific'
                - 'Latin America'
                - 'Middle East & Africa'

Run tests against seeds the same way as models:

dbt test --select country_codes

A Practical Example: Fiscal Calendar Seed

Finance teams often work on non-standard calendars. A fiscal calendar seed solves the problem of mapping dates to fiscal periods without custom code in every model.

seeds/fiscal_calendar.csv:

calendar_date,fiscal_year,fiscal_quarter,fiscal_month,fiscal_week
2025-01-01,FY2025,Q1,M01,W01
2025-01-02,FY2025,Q1,M01,W01
...
2025-03-31,FY2025,Q1,M03,W13
2025-04-01,FY2025,Q2,M04,W14

Load it once, and every model that needs fiscal context can join against it:

with daily_revenue as (
    select
        order_date,
        sum(order_amount_usd) as revenue
    from {{ ref('stg_orders') }}
    group by 1
),

with_fiscal as (
    select
        d.order_date,
        d.revenue,
        fc.fiscal_year,
        fc.fiscal_quarter,
        fc.fiscal_month
    from daily_revenue d
    left join {{ ref('fiscal_calendar') }} fc
        on d.order_date = fc.calendar_date
)

select * from with_fiscal

Updating Seeds Over Time

When your reference data changes, update the CSV file in your repo and run:

dbt seed --full-refresh

The --full-refresh flag drops and recreates the table, which is necessary when columns change or rows are deleted. Without it, dbt appends new rows but does not remove old ones.

For seeds that update frequently, consider whether a proper ingestion tool would be a better fit than maintaining a CSV manually.

Seeds in CI/CD Pipelines

In most CI/CD setups, you include dbt seed as a step before dbt build:

dbt deps          # install packages
dbt seed          # load reference CSVs
dbt build         # run models, snapshots, tests

This ensures reference tables are always current before models that depend on them run.

2025-2026 Notes on Seeds

Seeds have remained stable as a feature, but a few patterns have emerged in how teams use them:

Separate schema for seeds — Most teams now configure seeds to land in a dedicated schema (like reference or static) so they are visually distinct from transformed models in the warehouse catalog.

Seeds as a last resort — The dbt community increasingly treats seeds as a last resort rather than a convenience. If data has a live source (even a Google Sheet), tools like Airbyte or custom connectors are preferred because they automate updates. Seeds work best for data that genuinely has no automated source.

dbt packages with shared seeds — Some open-source dbt packages include seed files (like the dbt-date package’s holiday calendars). These install via dbt deps and can be referenced with {{ ref('package_name', 'seed_name') }}.

Seeds are a small feature with a clear use case: getting manually-maintained reference data into your warehouse, versioned alongside your models, with the same testing and documentation infrastructure you use for everything else. Used correctly, they eliminate the category of “data that lives in a spreadsheet and gets joined in by hand.”