SQL Subqueries and Nested Queries

A subquery is a SELECT statement nested inside another SQL statement. Subqueries let you answer questions that require multiple steps — “which customers spent more than the average?”, “which products have never been ordered?”, “what was each employee’s rank in their department?”

Types of Subqueries

Subqueries are categorized by what they return and where they appear:

By return type:
  Scalar subquery:  Returns a single value (one row, one column)
  Row subquery:     Returns a single row with multiple columns
  Table subquery:   Returns multiple rows and columns

By location:
  In WHERE:  Filter rows based on subquery result
  In FROM:   Treat the subquery as a derived table
  In SELECT: Compute a value for each row
  In HAVING: Filter groups based on subquery result

By correlated vs non-correlated:
  Non-correlated: Subquery runs once, independent of outer query
  Correlated:     Subquery references outer query columns, runs per row

Scalar Subquery (Returns a Single Value)

Used anywhere a single value is expected:

-- Products priced above the overall average
SELECT name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products)
ORDER BY price DESC;

-- Each product's price compared to the category average (in SELECT)
SELECT
    name,
    category,
    price,
    (SELECT AVG(price) FROM products p2 WHERE p2.category = p.category) AS category_avg,
    price - (SELECT AVG(price) FROM products p2 WHERE p2.category = p.category) AS diff_from_avg
FROM products p;

If a scalar subquery returns more than one row, it causes an error. Use aggregate functions to ensure it returns exactly one value.

Subquery with IN (Table Subquery)

Returns a list of values for the outer IN or NOT IN:

-- Orders from UK customers
SELECT order_id, amount, order_date
FROM orders
WHERE customer_id IN (
    SELECT customer_id
    FROM customers
    WHERE country = 'UK'
)
ORDER BY order_date DESC;

-- Products that have NEVER been ordered
SELECT product_id, name, price
FROM products
WHERE product_id NOT IN (
    SELECT DISTINCT product_id FROM order_items
);

Warning with NOT IN and NULLs: If the subquery returns any NULL values, NOT IN returns no rows for all outer rows. This is a common bug:

-- Dangerous if order_items.product_id can be NULL
WHERE product_id NOT IN (SELECT product_id FROM order_items)

-- Safe alternative using NOT EXISTS:
WHERE NOT EXISTS (
    SELECT 1 FROM order_items oi WHERE oi.product_id = products.product_id
)

EXISTS vs IN

EXISTS checks whether the subquery returns at least one row. It stops as soon as it finds a match — often more efficient than IN for large datasets.

-- Customers who have placed at least one completed order
SELECT customer_id, name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
      AND o.status = 'completed'
);

-- NOT EXISTS: customers who have never ordered
SELECT customer_id, name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);

The SELECT 1 inside EXISTS is conventional — the actual value doesn’t matter, only whether a row is found.

IN vs EXISTS:

Use IN when the subquery is small and non-correlated
Use EXISTS when the subquery references the outer query or the list could be large with NULLs

Correlated Subquery

A correlated subquery references columns from the outer query. It executes once for each row of the outer query:

-- Find employees who earn more than the average in their own department
SELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department = e.department   -- references outer query's department
)
ORDER BY e.department, e.salary DESC;

Correlated subqueries are powerful but can be slow for large tables — the inner query runs once per outer row. Window functions often replace them more efficiently:

-- Same result, faster execution
SELECT name, department, salary
FROM (
    SELECT
        name,
        department,
        salary,
        AVG(salary) OVER (PARTITION BY department) AS dept_avg
    FROM employees
) t
WHERE salary > dept_avg;

Subquery in FROM (Derived Table)

Treat a subquery as a table you can then query from:

-- Revenue per customer, then find customers above average
SELECT customer_id, name, total_revenue
FROM (
    SELECT
        c.customer_id,
        c.name,
        SUM(o.amount) AS total_revenue
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    WHERE o.status = 'completed'
    GROUP BY c.customer_id, c.name
) customer_revenue
WHERE total_revenue > 1000
ORDER BY total_revenue DESC;

Derived tables must be given an alias. In modern SQL, CTEs (WITH clause) are usually preferred over derived tables for readability.

Subquery in HAVING

Filter aggregated groups based on a subquery result:

-- Categories with above-average product count
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > (
    SELECT AVG(cat_count)
    FROM (
        SELECT category, COUNT(*) AS cat_count
        FROM products
        GROUP BY category
    ) t
);

Subquery vs CTE vs JOIN

Often, the same logic can be written multiple ways. Choosing between them is a readability and performance trade-off:

-- As a subquery
SELECT name, price
FROM products
WHERE category_id IN (SELECT category_id FROM categories WHERE is_featured = TRUE);

-- As a JOIN (often faster)
SELECT p.name, p.price
FROM products p
JOIN categories c ON p.category_id = c.category_id
WHERE c.is_featured = TRUE;

-- As a CTE (most readable for complex logic)
WITH featured_categories AS (
    SELECT category_id FROM categories WHERE is_featured = TRUE
)
SELECT p.name, p.price
FROM products p
WHERE p.category_id IN (SELECT category_id FROM featured_categories);

General guidance:

Use a JOIN when combining two sets of data with matching keys
Use a subquery in WHERE when filtering with a lookup list
Use a CTE when the intermediate result set is complex or reused
Use EXISTS when checking for the presence of related rows

Common Subquery Patterns

Top product per customer:

SELECT customer_id, product_id, amount
FROM orders o1
WHERE amount = (
    SELECT MAX(amount)
    FROM orders o2
    WHERE o2.customer_id = o1.customer_id
);

Second highest salary:

SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Running total via subquery (older pattern, use window functions instead):

-- Modern approach
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;