How to Learn SQL: A Realistic Roadmap

Learning SQL is one of the highest-return investments you can make as someone working with data. It’s straightforward to get started, genuinely useful within days, and the skill compounds — every additional SQL concept you learn opens up more analytical capabilities.

This guide is a learning roadmap, not a tutorial. It’s about sequence: what to learn first, what to skip early, and how to build up to production-level SQL skills.

Why SQL Still Matters in 2025

SQL is 50 years old and as relevant as ever. A few reasons:

Every major data warehouse (Snowflake, BigQuery, Redshift) accepts SQL as its primary interface
dbt — the analytics engineering tool used by thousands of data teams — is built entirely on SQL
Business intelligence tools (Looker, Tableau, Power BI, Metabase) all generate or accept SQL under the hood
Job postings for data analyst, data engineer, data scientist, and business analyst roles list SQL as required more than any other technical skill
AI tools can generate SQL, but someone still needs to verify it’s correct — that’s you

Stage 1: The Fundamentals (Week 1–2)

The goal of this stage is to write queries that actually retrieve useful data from a single table.

What to learn:

SELECT with column lists and SELECT *
WHERE with conditions: =, !=, >, <, >=, <=
Combining conditions with AND, OR, NOT
ORDER BY with ASC and DESC
LIMIT (or TOP in SQL Server) to control result size
DISTINCT to remove duplicate rows
IS NULL and IS NOT NULL for missing values
LIKE with % and _ wildcards for text pattern matching

Practice exercise: Find all orders placed in the last 30 days with a total over $100, sorted by date descending, showing only the top 10.

Stage 2: Working with Multiple Tables (Week 2–3)

Once you can query a single table well, joins unlock exponentially more analytical power.

What to learn:

INNER JOIN — rows that match in both tables
LEFT JOIN — all rows from the left table, nulls for unmatched right rows
RIGHT JOIN — the mirror of LEFT JOIN
FULL OUTER JOIN — all rows from both tables
Table aliases to keep queries readable
Joining on multiple columns when needed

-- Find all customers and their total orders
SELECT
    c.name,
    COUNT(o.order_id) AS order_count,
    COALESCE(SUM(o.amount), 0) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_spent DESC;

Key insight: Use LEFT JOIN (not INNER JOIN) when you want to keep rows even if there’s no match — for example, customers who have never placed an order.

Stage 3: Aggregation and Grouping (Week 3–4)

This is where SQL gets analytical. Aggregate functions summarize data; GROUP BY creates the groups to summarize.

What to learn:

COUNT(), SUM(), AVG(), MIN(), MAX()
GROUP BY — group rows by one or more columns
HAVING — filter groups (like WHERE, but applied after grouping)
ROLLUP and CUBE for multi-level aggregation
COALESCE() to handle nulls in aggregations

-- Monthly revenue summary
SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(DISTINCT customer_id) AS unique_customers,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY 1
HAVING SUM(amount) > 10000
ORDER BY 1;

Stage 4: Subqueries and CTEs (Week 4–5)

Complex questions require building up answers in steps. Subqueries and CTEs let you do that.

Subqueries are SELECT statements nested inside another query:

-- Customers who spent more than the average customer
SELECT name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    GROUP BY customer_id
    HAVING SUM(amount) > (SELECT AVG(total) FROM customer_totals)
);

CTEs (Common Table Expressions) use WITH to name intermediate results, making complex queries much more readable:

WITH monthly_totals AS (
    SELECT
        customer_id,
        DATE_TRUNC('month', order_date) AS month,
        SUM(amount) AS monthly_spend
    FROM orders
    GROUP BY 1, 2
),
top_customers AS (
    SELECT customer_id
    FROM monthly_totals
    WHERE monthly_spend > 500
    GROUP BY customer_id
    HAVING COUNT(DISTINCT month) >= 3
)
SELECT c.name, c.email
FROM customers c
JOIN top_customers tc ON c.customer_id = tc.customer_id;

CTEs are preferred over nested subqueries in most modern SQL — they’re easier to read, debug, and maintain.

Stage 5: Window Functions (Week 5–7)

Window functions are the most powerful feature most SQL learners skip. They perform calculations across a set of rows related to the current row — without collapsing the result set like GROUP BY does.

-- Running total and rank within each product category
SELECT
    product_name,
    category,
    sale_amount,
    SUM(sale_amount) OVER (
        PARTITION BY category
        ORDER BY sale_date
    ) AS running_total_in_category,
    RANK() OVER (
        PARTITION BY category
        ORDER BY sale_amount DESC
    ) AS rank_in_category,
    LAG(sale_amount) OVER (
        PARTITION BY product_name
        ORDER BY sale_date
    ) AS previous_sale
FROM sales;

Key window functions to know:

ROW_NUMBER() — unique sequential number per partition
RANK() / DENSE_RANK() — ranking with tie-handling options
LAG() / LEAD() — access previous/next row values
SUM() OVER / AVG() OVER — running totals and moving averages
FIRST_VALUE() / LAST_VALUE() — first/last value in a window

Window functions are expected in senior data roles and are frequently tested in interviews.

Stage 6: Performance and Optimization (Week 7–10)

Writing queries that work is one skill; writing queries that are fast is another.

What to learn:

How indexes work and when to use them
Reading EXPLAIN / EXPLAIN ANALYZE output
Avoiding common performance pitfalls:
- SELECT * when only a few columns are needed
- Non-sargable WHERE conditions (functions on indexed columns)
- N+1 query patterns
- Unnecessary DISTINCT when data is already unique
Materialized views and when they help
Partitioning for large tables

-- Check what PostgreSQL does with your query
EXPLAIN ANALYZE
SELECT c.name, SUM(o.amount)
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date > '2025-01-01'
GROUP BY c.customer_id, c.name;

Stage 7: Dialect-Specific Skills

Once you’re solid on standard SQL, learn the specifics of the platform you’re using:

PostgreSQL: jsonb operations, lateral joins, extensions (pg_trgm, PostGIS)
BigQuery: nested/repeated fields, partitioned tables, slots, BI Engine
Snowflake: micro-partitions, time travel, zero-copy cloning, Cortex AI functions
SQL Server: T-SQL procedural code, CTEs with recursion, columnstore indexes
DuckDB: parquet/CSV/JSON queries, httpfs extension for S3

What to Practice On

Free options:

SQLZoo — interactive exercises, increasing difficulty
Mode SQL Tutorial — real datasets, good explanations
LeetCode Database section — interview-style problems
DuckDB locally — run SQL against any CSV or Parquet file on your laptop

Build a project: Download a public dataset (Kaggle, data.gov, NYC Open Data), load it into DuckDB or PostgreSQL, and write queries to answer questions you find interesting. Nothing teaches SQL faster than having a real question you actually want answered.

Honest Advice

Don’t skip the basics because they seem obvious. NULL handling, join types, and aggregate behavior have subtle edge cases that trip up experienced developers. Understand them properly from the start.

Learn one dialect well before jumping between them. PostgreSQL or SQLite for local learning; whatever your employer uses for work.

Read slow queries. Looking at execution plans and figuring out why a query is slow teaches you more about how databases work than writing 100 fast queries does.

Write it before you generate it. AI tools (ChatGPT, GitHub Copilot, Claude) can generate SQL — but if you can’t read and verify what they produce, you’ll ship bugs to production. Learn to write SQL first, then use AI to accelerate.