SQL Data Aggregation
Aggregation is how SQL transforms rows into summaries. Instead of returning individual records, aggregate queries collapse sets of rows into computed values — totals, counts, averages, minimums, maximums. Combined with GROUP BY, aggregation becomes one of the most analytically powerful features in SQL.
The Core Aggregate Functions
COUNT(*) -- count all rows, including NULLsCOUNT(column) -- count rows where column is NOT NULLCOUNT(DISTINCT column) -- count unique non-null values
SUM(column) -- total of all values (ignores NULLs)AVG(column) -- arithmetic mean (ignores NULLs)MIN(column) -- smallest valueMAX(column) -- largest value
STRING_AGG(column, ', ') -- concatenate strings (PostgreSQL)GROUP_CONCAT(column) -- MySQL equivalentARRAY_AGG(column) -- collect values into an array (PostgreSQL)All aggregate functions except COUNT(*) ignore NULL values.
GROUP BY: Aggregating by Category
GROUP BY divides rows into groups based on distinct values in one or more columns, then applies aggregate functions to each group:
-- Total sales by product categorySELECT category, COUNT(*) AS total_products, SUM(sales_count) AS total_units_sold, ROUND(AVG(price), 2) AS avg_price, MIN(price) AS cheapest, MAX(price) AS most_expensiveFROM productsGROUP BY categoryORDER BY total_units_sold DESC;Rules for GROUP BY:
- Every column in the SELECT list must be either in the GROUP BY clause or wrapped in an aggregate function
- You cannot reference a column alias defined in SELECT inside GROUP BY (in most databases — use the ordinal position instead)
-- GROUP BY ordinal position (works in most databases)SELECT DATE_TRUNC('month', order_date) AS month, status, COUNT(*) AS order_countFROM ordersGROUP BY 1, 2 -- references month (position 1) and status (position 2)ORDER BY 1, 2;HAVING: Filtering Groups
WHERE filters rows before grouping. HAVING filters groups after aggregation:
-- Only show categories with more than 100 ordersSELECT category, COUNT(*) AS order_count, SUM(amount) AS total_revenueFROM ordersJOIN products USING (product_id)GROUP BY categoryHAVING COUNT(*) > 100ORDER BY total_revenue DESC;-- Customers who spent more than $1,000 in 2025SELECT customer_id, SUM(amount) AS total_spentFROM ordersWHERE order_date >= '2025-01-01' -- WHERE filters before GROUP BYGROUP BY customer_idHAVING SUM(amount) > 1000 -- HAVING filters after GROUP BYORDER BY total_spent DESC;A common mistake is trying to use an aggregate function in a WHERE clause — that’s what HAVING is for.
COUNT Nuances
-- Three different results from one table:SELECT COUNT(*) AS all_rows, -- counts every row including NULLs COUNT(phone) AS with_phone, -- counts rows where phone is NOT NULL COUNT(DISTINCT city) AS unique_cities -- counts unique city valuesFROM customers;COUNT(*) is the fastest because it doesn’t need to check individual column values.
Grouping by Multiple Columns
-- Sales breakdown by year, quarter, and product categorySELECT EXTRACT(YEAR FROM order_date) AS year, EXTRACT(QUARTER FROM order_date) AS quarter, category, COUNT(*) AS orders, SUM(amount) AS revenueFROM ordersJOIN products USING (product_id)GROUP BY 1, 2, 3ORDER BY 1, 2, 3;ROLLUP: Subtotals and Grand Totals
ROLLUP generates subtotals and a grand total automatically. It creates result rows for every combination of the GROUP BY columns from right to left, plus one row for the overall total:
SELECT COALESCE(category, 'ALL CATEGORIES') AS category, COALESCE(status, 'ALL STATUSES') AS status, SUM(amount) AS totalFROM ordersJOIN products USING (product_id)GROUP BY ROLLUP (category, status)ORDER BY category NULLS LAST, status NULLS LAST;Result includes:
- Rows for each (category, status) combination
- Rows for each category subtotal (status is NULL)
- One grand total row (both category and status are NULL)
Use COALESCE to replace NULLs in ROLLUP output with descriptive labels.
CUBE generates subtotals for all combinations:
-- All combinations of year × category × region subtotalsGROUP BY CUBE (year, category, region)GROUPING SETS for precise control:
-- Only the subtotals you actually wantGROUP BY GROUPING SETS ( (category, status), -- by category + status (category), -- by category only () -- grand total)FILTER Clause (PostgreSQL)
The FILTER clause on aggregate functions is cleaner than CASE WHEN for conditional aggregation:
SELECT DATE_TRUNC('month', order_date) AS month, COUNT(*) AS total_orders, COUNT(*) FILTER (WHERE status = 'completed') AS completed, COUNT(*) FILTER (WHERE status = 'refunded') AS refunded, SUM(amount) FILTER (WHERE status = 'completed') AS completed_revenue, AVG(amount) FILTER (WHERE amount > 100) AS avg_large_orderFROM ordersGROUP BY 1ORDER BY 1;Equivalent using CASE WHEN (works in all databases):
COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed,SUM(CASE WHEN status = 'completed' THEN amount ELSE 0 END) AS completed_revenueSTRING_AGG / GROUP_CONCAT
Aggregate text values from multiple rows into a single string:
-- PostgreSQL: list all products per category in one rowSELECT category, STRING_AGG(name, ', ' ORDER BY name) AS product_listFROM productsGROUP BY category;
-- MySQLSELECT category, GROUP_CONCAT(name ORDER BY name SEPARATOR ', ') AS product_listFROM productsGROUP BY category;Aggregation with Window Functions
Window functions let you compute aggregates without collapsing rows — you keep all the original rows AND add a summary column alongside them:
-- Show each order alongside the customer's total spentSELECT order_id, customer_id, amount, SUM(amount) OVER (PARTITION BY customer_id) AS customer_lifetime_value, amount / SUM(amount) OVER (PARTITION BY customer_id) AS pct_of_customer_totalFROM orders;This is the key difference: GROUP BY gives you one row per group; OVER (PARTITION BY ...) gives you one row per original row with the aggregate computed alongside.
Real-World Aggregation Examples
Cohort analysis: revenue by acquisition month:
SELECT DATE_TRUNC('month', c.created_at) AS cohort_month, DATE_TRUNC('month', o.order_date) AS order_month, COUNT(DISTINCT o.customer_id) AS active_customers, SUM(o.amount) AS revenueFROM customers cJOIN orders o ON c.customer_id = o.customer_idGROUP BY 1, 2ORDER BY 1, 2;Top 10 products by revenue:
SELECT p.name, COUNT(*) AS times_ordered, SUM(oi.quantity * oi.unit_price) AS total_revenueFROM order_items oiJOIN products p ON oi.product_id = p.product_idGROUP BY p.product_id, p.nameORDER BY total_revenue DESCLIMIT 10;Customer churn risk: days since last order:
SELECT customer_id, MAX(order_date) AS last_order, CURRENT_DATE - MAX(order_date) AS days_since_last_order, COUNT(*) AS total_ordersFROM ordersGROUP BY customer_idHAVING CURRENT_DATE - MAX(order_date) > 90 -- no orders in 90 daysORDER BY days_since_last_order DESC;