SQL Indexes: Speed Up Queries Without Changing Your Data
An index is a data structure the database maintains alongside a table to speed up row lookups. Without an index, the database performs a sequential scan — checking every row in the table. With an index on the right column, it can jump directly to matching rows.
The trade-off: indexes consume storage and slow down INSERT, UPDATE, and DELETE (because the index must be updated each time). More indexes isn’t always better.
How a B-Tree Index Works
Most SQL indexes use a B-tree (balanced tree) structure. The database builds a sorted tree of the indexed column values, with each leaf node pointing to the physical location of the matching row.
B-tree index on orders.customer_id
[500] / \ [250] [750] / \ / \[100][300][600][900] | | | | rows rows rows rows (physical row pointers)When you query WHERE customer_id = 300, the database:
- Starts at the root (500) — 300 < 500, go left
- Reaches 250 — 300 > 250, go right
- Reaches 300 — found it; follow pointer to data rows
This takes O(log n) time instead of O(n) for a full table scan. On a table with 10 million rows, that’s the difference between checking ~24 nodes vs. 10 million rows.
Creating Indexes
-- Basic index on a single columnCREATE INDEX idx_orders_customer ON orders(customer_id);
-- Unique index (also enforces uniqueness constraint)CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Composite index (order matters)CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- Descending index (useful for ORDER BY DESC queries)CREATE INDEX idx_orders_date_desc ON orders(order_date DESC);
-- Index on expressionCREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Partial index (only index rows matching a condition)CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';
-- PostgreSQL: concurrent index build (doesn't lock the table)CREATE INDEX CONCURRENTLY idx_orders_amount ON orders(amount);Index Types
B-tree (default) — Best for equality (=) and range comparisons (>, <, BETWEEN). Supports ORDER BY optimization. Used in the vast majority of cases.
Hash — Only supports equality (=). Slightly faster than B-tree for pure equality lookups, but can’t support range queries or sorting. PostgreSQL supports hash indexes (and they’re crash-safe since PostgreSQL 10).
GiST — Generalized Search Tree. Supports geometric types, full-text search, and range types in PostgreSQL.
GIN — Generalized Inverted Index. Optimal for indexing array values or JSONB columns where you need to search inside composite values.
BRIN — Block Range INdex. Very small, best for naturally ordered data like timestamps on append-only tables.
-- GIN index for JSONB queriesCREATE INDEX idx_events_data ON events USING GIN(event_data);
-- GIN index for full-text searchCREATE INDEX idx_articles_search ON articles USING GIN(to_tsvector('english', content));
-- BRIN for time-series dataCREATE INDEX idx_logs_created ON logs USING BRIN(created_at);Composite Index Column Order
For a composite index like (customer_id, order_date), the order of columns determines which queries benefit:
-- Index on (customer_id, order_date)
-- USES the index fully:WHERE customer_id = 101 AND order_date >= '2025-01-01'WHERE customer_id = 101
-- USES the index partially (customer_id only):WHERE customer_id = 101ORDER BY order_date -- can still use index for this
-- DOES NOT use the index:WHERE order_date >= '2025-01-01' -- skipping the first columnThe leftmost prefix rule: a composite index can be used for any query that filters on a prefix of the indexed columns. Always put the most selective column (the one with the most distinct values) first when you have a choice.
When Queries Don’t Use Indexes
Several patterns prevent index usage:
-- Functions on indexed column (prevents index use)WHERE YEAR(order_date) = 2025 -- badWHERE order_date >= '2025-01-01' -- good
WHERE LOWER(email) = 'user@example.com' -- bad (unless expression index exists)WHERE email = 'user@example.com' -- good (assumes email stored as lowercase)
-- Type mismatch (implicit cast can block index use)WHERE customer_id = '12345' -- customer_id is INTEGER; string literal may not match
-- Leading wildcard (can't use B-tree index)WHERE name LIKE '%smith%' -- badWHERE name LIKE 'smith%' -- good (leading literal)
-- OR conditions (complex; may or may not use index)WHERE status = 'active' OR customer_id = 101-- Better as UNION:SELECT * FROM orders WHERE status = 'active'UNIONSELECT * FROM orders WHERE customer_id = 101;Identifying Missing Indexes
EXPLAIN ANALYZE shows whether an index is being used:
EXPLAIN ANALYZESELECT * FROM orders WHERE customer_id = 101 AND status = 'completed';Look for:
Seq Scanon large tables → likely missing an indexIndex Scan→ good, using the indexIndex Only Scan→ best case, data retrieved from index without touching the tableBitmap Index Scan→ medium case, many rows match
Find tables with sequential scans (PostgreSQL):
SELECT relname AS table_name, seq_scan, idx_scan, n_live_tup AS rowsFROM pg_stat_user_tablesWHERE seq_scan > idx_scan AND n_live_tup > 10000ORDER BY seq_scan DESC;Too Many Indexes: The Cost Side
Every index:
- Uses disk space (B-tree indexes are typically 5–20% of the table’s size)
- Slows down writes (each
INSERT,UPDATE,DELETEmust also update all indexes) - Is maintained by the database’s autovacuum/statistics processes
For read-heavy tables (analytics, data warehouses), many indexes are fine. For write-heavy tables (transactional systems, event logs), too many indexes degrade write throughput significantly.
Remove unused indexes (PostgreSQL):
SELECT schemaname, tablename, indexname, idx_scan AS times_usedFROM pg_stat_user_indexesWHERE idx_scan = 0 AND indexname NOT LIKE 'pg_%'ORDER BY tablename;Zero scans since the last statistics reset means the index probably isn’t helping.
Covering Indexes
A covering index includes all columns the query needs — the database can satisfy the query entirely from the index without reading the main table.
-- Query only needs these three columnsSELECT customer_id, order_date, amount FROM orders WHERE status = 'completed';
-- Covering index that includes all threeCREATE INDEX idx_orders_covering ON orders(status) INCLUDE (customer_id, order_date, amount);-- PostgreSQL 11+ INCLUDE syntax adds non-key columns to the leaf pagesCovering indexes are powerful for frequently-run reporting queries — they eliminate the “heap fetch” step entirely.
Index Maintenance
In PostgreSQL, rows aren’t deleted from indexes immediately — they’re marked as dead and cleaned up by autovacuum. Heavily updated tables can develop bloated, inefficient indexes.
-- Rebuild an index (locks the table)REINDEX INDEX idx_orders_customer;
-- Rebuild without locking (PostgreSQL 12+)REINDEX INDEX CONCURRENTLY idx_orders_customer;
-- Check index bloatSELECT indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_sizeFROM pg_stat_user_indexesORDER BY pg_relation_size(indexrelid) DESC;Indexes are one of the highest-impact tuning levers in SQL. A well-placed index on a frequently-queried column can turn a 30-second query into a millisecond response. The key is placing them thoughtfully — adding indexes where queries actually filter, join, or sort — and removing them where they’re just adding write overhead without helping reads.