dbt Seeds: Static Reference Data That Belongs in Your DAG
Most data in a warehouse comes from source systems β applications, APIs, event streams. But some data does not come from anywhere except a spreadsheet someone maintains manually. Country codes. Fiscal calendar mappings. Product category hierarchies. Internal cost rates. This kind of reference data needs to live in your warehouse too, and dbt seeds are the right way to get it there.
What Is a dbt Seed?
A seed is a CSV file stored inside your dbt project that dbt loads into your warehouse as a table. Once loaded, you can reference it in models exactly the same way you reference any other model β using {{ ref() }}.
Seeds are not meant for large datasets. They are for small, slowly-changing reference tables where the authoritative source is a flat file rather than a live system.
dbt project structure with seeds---------------------------------my_project/βββ models/β βββ staging/β βββ marts/βββ seeds/ <-- your CSV files go hereβ βββ country_codes.csvβ βββ product_categories.csvβ βββ fiscal_calendar.csvβββ snapshots/βββ dbt_project.ymlWhen to Use Seeds (and When Not To)
Seeds are a good fit when:
- The data is small (a few hundred to a few thousand rows)
- The data changes infrequently
- The source of truth is a spreadsheet or manually maintained file
- You want the reference data version-controlled alongside your models
Seeds are not a good fit when:
- The data is large (tens of thousands of rows or more β use a proper ingestion tool)
- The data updates automatically from a source system
- The data contains sensitive information (seeds are committed to git)
Creating a Seed File
Drop a CSV file into the seeds/ directory. dbt infers column types from the data. For a country code lookup table, seeds/country_codes.csv might look like:
country_code,country_name,region,currency_codeUS,United States,North America,USDGB,United Kingdom,Europe,GBPDE,Germany,Europe,EURJP,Japan,Asia Pacific,JPYAU,Australia,Asia Pacific,AUDCA,Canada,North America,CADSG,Singapore,Asia Pacific,SGDLoad it into the warehouse:
dbt seeddbt creates a table in your warehouse using the filename as the table name (country_codes), in the schema defined in your project config.
Configuring Seeds in dbt_project.yml
You can control how seeds behave through dbt_project.yml:
seeds: my_project: +schema: reference # loads into a 'reference' schema +quote_columns: false country_codes: +column_types: country_code: varchar(2) currency_code: varchar(3) fiscal_calendar: +schema: finance_referenceFor columns where dbtβs type inference might be unreliable (like codes that look like integers), always specify types explicitly.
Referencing Seeds in Models
Once loaded, a seed is referenced just like any other model:
-- models/marts/fct_orders_with_region.sql
with orders as ( select * from {{ ref('stg_orders') }}),
countries as ( select * from {{ ref('country_codes') }}),
enriched as ( select o.order_id, o.order_date, o.customer_id, o.order_amount_usd, c.country_name, c.region, c.currency_code from orders o left join countries c on o.ship_to_country_code = c.country_code)
select * from enricheddbt knows that fct_orders_with_region depends on both stg_orders and the country_codes seed, so both are included in the lineage graph.
How Seeds Appear in the DAG
The DAG with seeds included:
[source: raw.orders] [seed: country_codes] | | [stg_orders] | | | +------[fct_orders_with_region]--+ | [regional_revenue_report]Seeds show up in dbt docs with the same documentation and lineage tracking as any other node in the project.
Adding Tests and Documentation to Seeds
You can document and test seeds with YAML, just like models:
version: 2
seeds: - name: country_codes description: "ISO 3166-1 alpha-2 country codes with region and currency mapping" columns: - name: country_code description: "Two-letter ISO country code" tests: - unique - not_null - name: country_name tests: - not_null - name: region tests: - not_null - accepted_values: values: - 'North America' - 'Europe' - 'Asia Pacific' - 'Latin America' - 'Middle East & Africa'Run tests against seeds the same way as models:
dbt test --select country_codesA Practical Example: Fiscal Calendar Seed
Finance teams often work on non-standard calendars. A fiscal calendar seed solves the problem of mapping dates to fiscal periods without custom code in every model.
seeds/fiscal_calendar.csv:
calendar_date,fiscal_year,fiscal_quarter,fiscal_month,fiscal_week2025-01-01,FY2025,Q1,M01,W012025-01-02,FY2025,Q1,M01,W01...2025-03-31,FY2025,Q1,M03,W132025-04-01,FY2025,Q2,M04,W14Load it once, and every model that needs fiscal context can join against it:
with daily_revenue as ( select order_date, sum(order_amount_usd) as revenue from {{ ref('stg_orders') }} group by 1),
with_fiscal as ( select d.order_date, d.revenue, fc.fiscal_year, fc.fiscal_quarter, fc.fiscal_month from daily_revenue d left join {{ ref('fiscal_calendar') }} fc on d.order_date = fc.calendar_date)
select * from with_fiscalUpdating Seeds Over Time
When your reference data changes, update the CSV file in your repo and run:
dbt seed --full-refreshThe --full-refresh flag drops and recreates the table, which is necessary when columns change or rows are deleted. Without it, dbt appends new rows but does not remove old ones.
For seeds that update frequently, consider whether a proper ingestion tool would be a better fit than maintaining a CSV manually.
Seeds in CI/CD Pipelines
In most CI/CD setups, you include dbt seed as a step before dbt build:
dbt deps # install packagesdbt seed # load reference CSVsdbt build # run models, snapshots, testsThis ensures reference tables are always current before models that depend on them run.
2025-2026 Notes on Seeds
Seeds have remained stable as a feature, but a few patterns have emerged in how teams use them:
Separate schema for seeds β Most teams now configure seeds to land in a dedicated schema (like reference or static) so they are visually distinct from transformed models in the warehouse catalog.
Seeds as a last resort β The dbt community increasingly treats seeds as a last resort rather than a convenience. If data has a live source (even a Google Sheet), tools like Airbyte or custom connectors are preferred because they automate updates. Seeds work best for data that genuinely has no automated source.
dbt packages with shared seeds β Some open-source dbt packages include seed files (like the dbt-date packageβs holiday calendars). These install via dbt deps and can be referenced with {{ ref('package_name', 'seed_name') }}.
Seeds are a small feature with a clear use case: getting manually-maintained reference data into your warehouse, versioned alongside your models, with the same testing and documentation infrastructure you use for everything else. Used correctly, they eliminate the category of βdata that lives in a spreadsheet and gets joined in by hand.β