ETL vs ELT: What Changed, and Why It Matters
Data engineers spend a lot of time moving and reshaping data. The two dominant patterns for doing that — ETL and ELT — look similar at first glance. They involve the same three steps: extract, load, and transform. The difference is the order, and that order has enormous practical consequences for how you build, maintain, and scale your pipelines.
The Core Difference
ETL (Extract, Transform, Load) transforms data before it enters the warehouse. You pull raw data from sources, run it through cleaning and reshaping logic in a separate tool or server, then load the result into storage.
ELT (Extract, Load, Transform) loads raw data into the warehouse first, then transforms it there using the warehouse’s own compute.
ETL Flow--------[Source Systems] --> [Extract] --> [Transform in External Tool] --> [Load to Warehouse] --> [BI / Analytics]
ELT Flow--------[Source Systems] --> [Extract] --> [Load Raw to Warehouse] --> [Transform Inside Warehouse] --> [BI / Analytics]That shift — from transforming outside the warehouse to transforming inside it — changes almost everything about how data teams work.
Why ETL Made Sense in the 1990s and 2000s
When ETL became the standard approach, data warehouses were expensive and storage was limited. You did not want to load raw, messy data because every byte cost money and query performance was precious. So you cleaned and structured data before it ever touched the warehouse.
The tools built for this world — Informatica, Talend, IBM DataStage, SSIS — were powerful but required specialized skills, ran on dedicated servers, and were difficult to version-control or test. Changes to transformation logic were often made in GUIs rather than code, which made collaboration and debugging painful.
Why ELT Became the Default After 2015
Cloud data warehouses changed the economics completely. Snowflake, BigQuery, and Redshift can scale compute independently of storage. Storing raw data became cheap, and running transformations inside the warehouse became fast and cost-efficient.
This flipped the logic. Instead of pre-processing data before loading it, teams started loading everything raw and running transformations as SQL inside the warehouse — where the compute lives, where the data already is, and where analysts and engineers can both see and query it.
Old world (ETL) New world (ELT)------------------------------------ ------------------------------------Expensive storage -> clean before load Cheap storage -> load everything rawETL server does compute Warehouse does computeGUI-based tools SQL + version controlHard to test or debug dbt tests + git historyETL vs ELT Comparison
| Factor | ETL | ELT |
|---|---|---|
| Transform location | External server or tool | Inside the warehouse |
| Raw data preserved | Often discarded | Yes, always available |
| Tooling | Informatica, Talend, SSIS | dbt, Fivetran, Airbyte |
| Scalability | Limited by ETL server capacity | Scales with warehouse |
| Version control | Difficult | Native (git + SQL files) |
| Testing | Custom scripts or manual | Built into dbt |
| Speed of iteration | Slow (GUI, deployments) | Fast (write SQL, run dbt) |
| Best for | Legacy on-premise systems | Modern cloud warehouses |
ETL Code Example (Python + Pandas)
This is how a traditional ETL pipeline looks: clean the data in Python, then load it to the warehouse.
import pandas as pdfrom sqlalchemy import create_engine
# Extractorders = pd.read_csv('orders.csv')
# Transform (outside the warehouse)orders_clean = orders.dropna(subset=['order_id'])orders_clean = orders_clean.copy()orders_clean['total_price'] = orders_clean['quantity'] * orders_clean['unit_price']orders_clean['order_date'] = pd.to_datetime(orders_clean['order_date'])
# Loadengine = create_engine('postgresql://user:pass@localhost:5432/analytics')orders_clean.to_sql('orders_cleaned', engine, if_exists='replace', index=False)This works, but the transformation logic lives in a Python script that is hard to test, hard to document, and does not show up in the warehouse’s lineage graph.
ELT Code Example (SQL inside the warehouse)
With ELT, raw data is already in the warehouse. You write SQL to clean and reshape it.
-- Raw data is in raw.orders (loaded by Fivetran or Airbyte)-- Transform inside the warehouse:
create or replace table analytics.orders_clean as
select order_id, customer_id, quantity * unit_price as total_price, cast(order_date as date) as order_date, lower(trim(status)) as statusfrom raw.orderswhere order_id is not null and status != 'test';The logic is SQL, which every analyst on the team can read, review, and improve.
ELT with dbt (The Modern Standard)
dbt adds version control, testing, documentation, and dependency management on top of SQL transformations. Instead of running one-off SQL scripts, you write model files that dbt compiles and executes in the right order.
models/staging/stg_orders.sql:
with source as ( select * from {{ source('raw', 'orders') }}),
cleaned as ( select order_id, customer_id, quantity * unit_price as total_price, cast(order_date as date) as order_date, lower(trim(status)) as status from source where order_id is not null)
select * from cleanedRun it:
dbt run --select stg_ordersTest it:
dbt test --select stg_ordersdbt builds a full lineage graph from your model definitions, so you can see exactly what depends on what:
[source: raw.orders] | [stg_orders] | [fct_order_revenue] | [monthly_revenue_report]ELT with Fivetran + dbt + Snowflake
Most modern data stacks look like this:
[Source Systems] [Ingestion] [Warehouse] [Transformation] [BI Layer]----------------- ----------- ----------- ---------------- ----------Salesforce --> Fivetran --> Snowflake --> dbt models --> TableauPostgres DB --> Airbyte --> BigQuery --> tests + docs --> LookerREST APIs --> custom loader --> Redshift --> lineage graph --> Power BIFivetran or Airbyte handles the Extract and Load. dbt handles the Transform. The warehouse handles all the compute. No ETL servers, no GUI tools, no undocumented transformation logic scattered across notebooks.
What About Reverse ETL?
A newer pattern called Reverse ETL completes the loop by sending transformed warehouse data back to operational tools like Salesforce, HubSpot, or Intercom. Tools like Census and Hightouch handle this.
[Sources] --> [ELT: Fivetran + dbt + Snowflake] --> [BI Tools] | v [Reverse ETL: Census / Hightouch] | v [Salesforce, HubSpot, Slack, etc.]This is only possible because ELT keeps all the transformed data in the warehouse, making it available for both analytics and operational use cases.
When ETL Still Makes Sense
ELT is the better choice for most teams today, but there are situations where ETL still applies:
- Sensitive data — If you cannot store raw data in the cloud due to compliance requirements, you may need to anonymize or mask fields before loading
- Legacy systems — On-premise warehouses with limited compute capacity
- Very large binary files — Video, audio, or large documents that need processing before storage
For everything else — especially if you are working with a cloud warehouse — ELT is the simpler, more maintainable approach.
2025-2026 Perspective
The ETL vs ELT debate is largely settled in the industry. Major cloud vendors and tooling providers have all aligned around ELT. A few developments worth noting:
Streaming ELT — Tools like Confluent and dbt’s integration with streaming platforms are making near-real-time ELT viable, closing one of the traditional gaps between ETL and ELT.
dbt Python models — For transformations that cannot be expressed cleanly in SQL (ML feature engineering, API calls, complex string processing), dbt now supports Python model files that run inside warehouse-native Python environments like Snowpark or BigQuery DataFrames.
Data contracts — Teams are increasingly defining explicit contracts on raw source tables, specifying expected schemas, row counts, and freshness. This makes ELT pipelines more robust by catching upstream changes before they break downstream models.
Interview Q&A Quick Reference
| Question | Short answer |
|---|---|
| What is ETL? | Transform before loading into the warehouse |
| What is ELT? | Load raw first, then transform inside the warehouse |
| Why is ELT faster to iterate on? | SQL in the warehouse is easier to change than ETL server jobs |
| Where does dbt fit? | dbt is the T in ELT |
| Common ELT tools? | dbt, Fivetran, Airbyte, Snowflake, BigQuery |
| Common ETL tools? | Informatica, Talend, IBM DataStage, SSIS |
| When would you still use ETL? | Compliance masking, legacy on-prem, binary file processing |
The shift from ETL to ELT reflects a broader change in how data infrastructure works. Storage is cheap, cloud compute scales on demand, and SQL is universal. Keeping raw data in the warehouse and transforming it with version-controlled, tested SQL models is simply better engineering practice for the vast majority of modern data teams.