How to Learn SQL: A Realistic Roadmap
Learning SQL is one of the highest-return investments you can make as someone working with data. It’s straightforward to get started, genuinely useful within days, and the skill compounds — every additional SQL concept you learn opens up more analytical capabilities.
This guide is a learning roadmap, not a tutorial. It’s about sequence: what to learn first, what to skip early, and how to build up to production-level SQL skills.
Why SQL Still Matters in 2025
SQL is 50 years old and as relevant as ever. A few reasons:
- Every major data warehouse (Snowflake, BigQuery, Redshift) accepts SQL as its primary interface
- dbt — the analytics engineering tool used by thousands of data teams — is built entirely on SQL
- Business intelligence tools (Looker, Tableau, Power BI, Metabase) all generate or accept SQL under the hood
- Job postings for data analyst, data engineer, data scientist, and business analyst roles list SQL as required more than any other technical skill
- AI tools can generate SQL, but someone still needs to verify it’s correct — that’s you
Stage 1: The Fundamentals (Week 1–2)
The goal of this stage is to write queries that actually retrieve useful data from a single table.
What to learn:
SELECTwith column lists andSELECT *WHEREwith conditions:=,!=,>,<,>=,<=- Combining conditions with
AND,OR,NOT ORDER BYwithASCandDESCLIMIT(orTOPin SQL Server) to control result sizeDISTINCTto remove duplicate rowsIS NULLandIS NOT NULLfor missing valuesLIKEwith%and_wildcards for text pattern matching
Practice exercise: Find all orders placed in the last 30 days with a total over $100, sorted by date descending, showing only the top 10.
Stage 2: Working with Multiple Tables (Week 2–3)
Once you can query a single table well, joins unlock exponentially more analytical power.
What to learn:
INNER JOIN— rows that match in both tablesLEFT JOIN— all rows from the left table, nulls for unmatched right rowsRIGHT JOIN— the mirror of LEFT JOINFULL OUTER JOIN— all rows from both tables- Table aliases to keep queries readable
- Joining on multiple columns when needed
-- Find all customers and their total ordersSELECT c.name, COUNT(o.order_id) AS order_count, COALESCE(SUM(o.amount), 0) AS total_spentFROM customers cLEFT JOIN orders o ON c.customer_id = o.customer_idGROUP BY c.customer_id, c.nameORDER BY total_spent DESC;Key insight: Use LEFT JOIN (not INNER JOIN) when you want to keep rows even if there’s no match — for example, customers who have never placed an order.
Stage 3: Aggregation and Grouping (Week 3–4)
This is where SQL gets analytical. Aggregate functions summarize data; GROUP BY creates the groups to summarize.
What to learn:
COUNT(),SUM(),AVG(),MIN(),MAX()GROUP BY— group rows by one or more columnsHAVING— filter groups (like WHERE, but applied after grouping)ROLLUPandCUBEfor multi-level aggregationCOALESCE()to handle nulls in aggregations
-- Monthly revenue summarySELECT DATE_TRUNC('month', order_date) AS month, COUNT(DISTINCT customer_id) AS unique_customers, SUM(amount) AS total_revenue, AVG(amount) AS avg_order_valueFROM ordersWHERE order_date >= '2025-01-01'GROUP BY 1HAVING SUM(amount) > 10000ORDER BY 1;Stage 4: Subqueries and CTEs (Week 4–5)
Complex questions require building up answers in steps. Subqueries and CTEs let you do that.
Subqueries are SELECT statements nested inside another query:
-- Customers who spent more than the average customerSELECT nameFROM customersWHERE customer_id IN ( SELECT customer_id FROM orders GROUP BY customer_id HAVING SUM(amount) > (SELECT AVG(total) FROM customer_totals));CTEs (Common Table Expressions) use WITH to name intermediate results, making complex queries much more readable:
WITH monthly_totals AS ( SELECT customer_id, DATE_TRUNC('month', order_date) AS month, SUM(amount) AS monthly_spend FROM orders GROUP BY 1, 2),top_customers AS ( SELECT customer_id FROM monthly_totals WHERE monthly_spend > 500 GROUP BY customer_id HAVING COUNT(DISTINCT month) >= 3)SELECT c.name, c.emailFROM customers cJOIN top_customers tc ON c.customer_id = tc.customer_id;CTEs are preferred over nested subqueries in most modern SQL — they’re easier to read, debug, and maintain.
Stage 5: Window Functions (Week 5–7)
Window functions are the most powerful feature most SQL learners skip. They perform calculations across a set of rows related to the current row — without collapsing the result set like GROUP BY does.
-- Running total and rank within each product categorySELECT product_name, category, sale_amount, SUM(sale_amount) OVER ( PARTITION BY category ORDER BY sale_date ) AS running_total_in_category, RANK() OVER ( PARTITION BY category ORDER BY sale_amount DESC ) AS rank_in_category, LAG(sale_amount) OVER ( PARTITION BY product_name ORDER BY sale_date ) AS previous_saleFROM sales;Key window functions to know:
ROW_NUMBER()— unique sequential number per partitionRANK()/DENSE_RANK()— ranking with tie-handling optionsLAG()/LEAD()— access previous/next row valuesSUM() OVER/AVG() OVER— running totals and moving averagesFIRST_VALUE()/LAST_VALUE()— first/last value in a window
Window functions are expected in senior data roles and are frequently tested in interviews.
Stage 6: Performance and Optimization (Week 7–10)
Writing queries that work is one skill; writing queries that are fast is another.
What to learn:
- How indexes work and when to use them
- Reading
EXPLAIN/EXPLAIN ANALYZEoutput - Avoiding common performance pitfalls:
SELECT *when only a few columns are needed- Non-sargable WHERE conditions (functions on indexed columns)
- N+1 query patterns
- Unnecessary DISTINCT when data is already unique
- Materialized views and when they help
- Partitioning for large tables
-- Check what PostgreSQL does with your queryEXPLAIN ANALYZESELECT c.name, SUM(o.amount)FROM customers cJOIN orders o ON c.customer_id = o.customer_idWHERE o.order_date > '2025-01-01'GROUP BY c.customer_id, c.name;Stage 7: Dialect-Specific Skills
Once you’re solid on standard SQL, learn the specifics of the platform you’re using:
- PostgreSQL:
jsonboperations, lateral joins, extensions (pg_trgm, PostGIS) - BigQuery: nested/repeated fields, partitioned tables, slots, BI Engine
- Snowflake: micro-partitions, time travel, zero-copy cloning, Cortex AI functions
- SQL Server: T-SQL procedural code, CTEs with recursion, columnstore indexes
- DuckDB: parquet/CSV/JSON queries,
httpfsextension for S3
What to Practice On
Free options:
- SQLZoo — interactive exercises, increasing difficulty
- Mode SQL Tutorial — real datasets, good explanations
- LeetCode Database section — interview-style problems
- DuckDB locally — run SQL against any CSV or Parquet file on your laptop
Build a project: Download a public dataset (Kaggle, data.gov, NYC Open Data), load it into DuckDB or PostgreSQL, and write queries to answer questions you find interesting. Nothing teaches SQL faster than having a real question you actually want answered.
Honest Advice
Don’t skip the basics because they seem obvious. NULL handling, join types, and aggregate behavior have subtle edge cases that trip up experienced developers. Understand them properly from the start.
Learn one dialect well before jumping between them. PostgreSQL or SQLite for local learning; whatever your employer uses for work.
Read slow queries. Looking at execution plans and figuring out why a query is slow teaches you more about how databases work than writing 100 fast queries does.
Write it before you generate it. AI tools (ChatGPT, GitHub Copilot, Claude) can generate SQL — but if you can’t read and verify what they produce, you’ll ship bugs to production. Learn to write SQL first, then use AI to accelerate.