SQL Subqueries and Nested Queries
A subquery is a SELECT statement nested inside another SQL statement. Subqueries let you answer questions that require multiple steps — “which customers spent more than the average?”, “which products have never been ordered?”, “what was each employee’s rank in their department?”
Types of Subqueries
Subqueries are categorized by what they return and where they appear:
By return type: Scalar subquery: Returns a single value (one row, one column) Row subquery: Returns a single row with multiple columns Table subquery: Returns multiple rows and columns
By location: In WHERE: Filter rows based on subquery result In FROM: Treat the subquery as a derived table In SELECT: Compute a value for each row In HAVING: Filter groups based on subquery result
By correlated vs non-correlated: Non-correlated: Subquery runs once, independent of outer query Correlated: Subquery references outer query columns, runs per rowScalar Subquery (Returns a Single Value)
Used anywhere a single value is expected:
-- Products priced above the overall averageSELECT name, priceFROM productsWHERE price > (SELECT AVG(price) FROM products)ORDER BY price DESC;
-- Each product's price compared to the category average (in SELECT)SELECT name, category, price, (SELECT AVG(price) FROM products p2 WHERE p2.category = p.category) AS category_avg, price - (SELECT AVG(price) FROM products p2 WHERE p2.category = p.category) AS diff_from_avgFROM products p;If a scalar subquery returns more than one row, it causes an error. Use aggregate functions to ensure it returns exactly one value.
Subquery with IN (Table Subquery)
Returns a list of values for the outer IN or NOT IN:
-- Orders from UK customersSELECT order_id, amount, order_dateFROM ordersWHERE customer_id IN ( SELECT customer_id FROM customers WHERE country = 'UK')ORDER BY order_date DESC;
-- Products that have NEVER been orderedSELECT product_id, name, priceFROM productsWHERE product_id NOT IN ( SELECT DISTINCT product_id FROM order_items);Warning with NOT IN and NULLs: If the subquery returns any NULL values, NOT IN returns no rows for all outer rows. This is a common bug:
-- Dangerous if order_items.product_id can be NULLWHERE product_id NOT IN (SELECT product_id FROM order_items)
-- Safe alternative using NOT EXISTS:WHERE NOT EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = products.product_id)EXISTS vs IN
EXISTS checks whether the subquery returns at least one row. It stops as soon as it finds a match — often more efficient than IN for large datasets.
-- Customers who have placed at least one completed orderSELECT customer_id, nameFROM customers cWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.status = 'completed');
-- NOT EXISTS: customers who have never orderedSELECT customer_id, nameFROM customers cWHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);The SELECT 1 inside EXISTS is conventional — the actual value doesn’t matter, only whether a row is found.
IN vs EXISTS:
- Use
INwhen the subquery is small and non-correlated - Use
EXISTSwhen the subquery references the outer query or the list could be large with NULLs
Correlated Subquery
A correlated subquery references columns from the outer query. It executes once for each row of the outer query:
-- Find employees who earn more than the average in their own departmentSELECT e.name, e.department, e.salaryFROM employees eWHERE e.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.department = e.department -- references outer query's department)ORDER BY e.department, e.salary DESC;Correlated subqueries are powerful but can be slow for large tables — the inner query runs once per outer row. Window functions often replace them more efficiently:
-- Same result, faster executionSELECT name, department, salaryFROM ( SELECT name, department, salary, AVG(salary) OVER (PARTITION BY department) AS dept_avg FROM employees) tWHERE salary > dept_avg;Subquery in FROM (Derived Table)
Treat a subquery as a table you can then query from:
-- Revenue per customer, then find customers above averageSELECT customer_id, name, total_revenueFROM ( SELECT c.customer_id, c.name, SUM(o.amount) AS total_revenue FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.status = 'completed' GROUP BY c.customer_id, c.name) customer_revenueWHERE total_revenue > 1000ORDER BY total_revenue DESC;Derived tables must be given an alias. In modern SQL, CTEs (WITH clause) are usually preferred over derived tables for readability.
Subquery in HAVING
Filter aggregated groups based on a subquery result:
-- Categories with above-average product countSELECT category, COUNT(*) AS product_countFROM productsGROUP BY categoryHAVING COUNT(*) > ( SELECT AVG(cat_count) FROM ( SELECT category, COUNT(*) AS cat_count FROM products GROUP BY category ) t);Subquery vs CTE vs JOIN
Often, the same logic can be written multiple ways. Choosing between them is a readability and performance trade-off:
-- As a subquerySELECT name, priceFROM productsWHERE category_id IN (SELECT category_id FROM categories WHERE is_featured = TRUE);
-- As a JOIN (often faster)SELECT p.name, p.priceFROM products pJOIN categories c ON p.category_id = c.category_idWHERE c.is_featured = TRUE;
-- As a CTE (most readable for complex logic)WITH featured_categories AS ( SELECT category_id FROM categories WHERE is_featured = TRUE)SELECT p.name, p.priceFROM products pWHERE p.category_id IN (SELECT category_id FROM featured_categories);General guidance:
- Use a JOIN when combining two sets of data with matching keys
- Use a subquery in WHERE when filtering with a lookup list
- Use a CTE when the intermediate result set is complex or reused
- Use EXISTS when checking for the presence of related rows
Common Subquery Patterns
Top product per customer:
SELECT customer_id, product_id, amountFROM orders o1WHERE amount = ( SELECT MAX(amount) FROM orders o2 WHERE o2.customer_id = o1.customer_id);Second highest salary:
SELECT MAX(salary) AS second_highestFROM employeesWHERE salary < (SELECT MAX(salary) FROM employees);Running total via subquery (older pattern, use window functions instead):
-- Modern approachSELECT order_date, amount, SUM(amount) OVER (ORDER BY order_date) AS running_totalFROM orders;