SQL Data Aggregation

Aggregation is how SQL transforms rows into summaries. Instead of returning individual records, aggregate queries collapse sets of rows into computed values — totals, counts, averages, minimums, maximums. Combined with GROUP BY, aggregation becomes one of the most analytically powerful features in SQL.

The Core Aggregate Functions

COUNT(*)           -- count all rows, including NULLs
COUNT(column)      -- count rows where column is NOT NULL
COUNT(DISTINCT column)  -- count unique non-null values

SUM(column)        -- total of all values (ignores NULLs)
AVG(column)        -- arithmetic mean (ignores NULLs)
MIN(column)        -- smallest value
MAX(column)        -- largest value

STRING_AGG(column, ', ')  -- concatenate strings (PostgreSQL)
GROUP_CONCAT(column)      -- MySQL equivalent
ARRAY_AGG(column)         -- collect values into an array (PostgreSQL)

All aggregate functions except COUNT(*) ignore NULL values.

GROUP BY: Aggregating by Category

GROUP BY divides rows into groups based on distinct values in one or more columns, then applies aggregate functions to each group:

-- Total sales by product category
SELECT
    category,
    COUNT(*) AS total_products,
    SUM(sales_count) AS total_units_sold,
    ROUND(AVG(price), 2) AS avg_price,
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM products
GROUP BY category
ORDER BY total_units_sold DESC;

Rules for GROUP BY:

Every column in the SELECT list must be either in the GROUP BY clause or wrapped in an aggregate function
You cannot reference a column alias defined in SELECT inside GROUP BY (in most databases — use the ordinal position instead)

-- GROUP BY ordinal position (works in most databases)
SELECT
    DATE_TRUNC('month', order_date) AS month,
    status,
    COUNT(*) AS order_count
FROM orders
GROUP BY 1, 2  -- references month (position 1) and status (position 2)
ORDER BY 1, 2;

HAVING: Filtering Groups

WHERE filters rows before grouping. HAVING filters groups after aggregation:

-- Only show categories with more than 100 orders
SELECT
    category,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue
FROM orders
JOIN products USING (product_id)
GROUP BY category
HAVING COUNT(*) > 100
ORDER BY total_revenue DESC;

-- Customers who spent more than $1,000 in 2025
SELECT
    customer_id,
    SUM(amount) AS total_spent
FROM orders
WHERE order_date >= '2025-01-01'    -- WHERE filters before GROUP BY
GROUP BY customer_id
HAVING SUM(amount) > 1000           -- HAVING filters after GROUP BY
ORDER BY total_spent DESC;

A common mistake is trying to use an aggregate function in a WHERE clause — that’s what HAVING is for.

COUNT Nuances

-- Three different results from one table:
SELECT
    COUNT(*) AS all_rows,           -- counts every row including NULLs
    COUNT(phone) AS with_phone,     -- counts rows where phone is NOT NULL
    COUNT(DISTINCT city) AS unique_cities  -- counts unique city values
FROM customers;

COUNT(*) is the fastest because it doesn’t need to check individual column values.

Grouping by Multiple Columns

-- Sales breakdown by year, quarter, and product category
SELECT
    EXTRACT(YEAR FROM order_date) AS year,
    EXTRACT(QUARTER FROM order_date) AS quarter,
    category,
    COUNT(*) AS orders,
    SUM(amount) AS revenue
FROM orders
JOIN products USING (product_id)
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;

ROLLUP: Subtotals and Grand Totals

ROLLUP generates subtotals and a grand total automatically. It creates result rows for every combination of the GROUP BY columns from right to left, plus one row for the overall total:

SELECT
    COALESCE(category, 'ALL CATEGORIES') AS category,
    COALESCE(status, 'ALL STATUSES') AS status,
    SUM(amount) AS total
FROM orders
JOIN products USING (product_id)
GROUP BY ROLLUP (category, status)
ORDER BY category NULLS LAST, status NULLS LAST;

Result includes:

Rows for each (category, status) combination
Rows for each category subtotal (status is NULL)
One grand total row (both category and status are NULL)

Use COALESCE to replace NULLs in ROLLUP output with descriptive labels.

CUBE generates subtotals for all combinations:

-- All combinations of year × category × region subtotals
GROUP BY CUBE (year, category, region)

GROUPING SETS for precise control:

-- Only the subtotals you actually want
GROUP BY GROUPING SETS (
    (category, status),    -- by category + status
    (category),            -- by category only
    ()                     -- grand total
)

FILTER Clause (PostgreSQL)

The FILTER clause on aggregate functions is cleaner than CASE WHEN for conditional aggregation:

SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS total_orders,
    COUNT(*) FILTER (WHERE status = 'completed') AS completed,
    COUNT(*) FILTER (WHERE status = 'refunded') AS refunded,
    SUM(amount) FILTER (WHERE status = 'completed') AS completed_revenue,
    AVG(amount) FILTER (WHERE amount > 100) AS avg_large_order
FROM orders
GROUP BY 1
ORDER BY 1;

Equivalent using CASE WHEN (works in all databases):

COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed,
SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) AS completed_revenue

STRING_AGG / GROUP_CONCAT

Aggregate text values from multiple rows into a single string:

-- PostgreSQL: list all products per category in one row
SELECT
    category,
    STRING_AGG(name, ', ' ORDER BY name) AS product_list
FROM products
GROUP BY category;

-- MySQL
SELECT
    category,
    GROUP_CONCAT(name ORDER BY name SEPARATOR ', ') AS product_list
FROM products
GROUP BY category;

Aggregation with Window Functions

Window functions let you compute aggregates without collapsing rows — you keep all the original rows AND add a summary column alongside them:

-- Show each order alongside the customer's total spent
SELECT
    order_id,
    customer_id,
    amount,
    SUM(amount) OVER (PARTITION BY customer_id) AS customer_lifetime_value,
    amount / SUM(amount) OVER (PARTITION BY customer_id) AS pct_of_customer_total
FROM orders;

This is the key difference: GROUP BY gives you one row per group; OVER (PARTITION BY ...) gives you one row per original row with the aggregate computed alongside.

Real-World Aggregation Examples

Cohort analysis: revenue by acquisition month:

SELECT
    DATE_TRUNC('month', c.created_at) AS cohort_month,
    DATE_TRUNC('month', o.order_date) AS order_month,
    COUNT(DISTINCT o.customer_id) AS active_customers,
    SUM(o.amount) AS revenue
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY 1, 2
ORDER BY 1, 2;

Top 10 products by revenue:

SELECT
    p.name,
    COUNT(*) AS times_ordered,
    SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.product_id, p.name
ORDER BY total_revenue DESC
LIMIT 10;

Customer churn risk: days since last order:

SELECT
    customer_id,
    MAX(order_date) AS last_order,
    CURRENT_DATE - MAX(order_date) AS days_since_last_order,
    COUNT(*) AS total_orders
FROM orders
GROUP BY customer_id
HAVING CURRENT_DATE - MAX(order_date) > 90  -- no orders in 90 days
ORDER BY days_since_last_order DESC;