Technology  /  SQL

🗄️ SQL 40 guides · updated 2026

The language of data — from SELECT and JOINs to window functions, query plans, and the performance tuning that separates juniors from seniors.

SQL Indexes: Speed Up Queries Without Changing Your Data

An index is a data structure the database maintains alongside a table to speed up row lookups. Without an index, the database performs a sequential scan — checking every row in the table. With an index on the right column, it can jump directly to matching rows.

The trade-off: indexes consume storage and slow down INSERT, UPDATE, and DELETE (because the index must be updated each time). More indexes isn’t always better.


How a B-Tree Index Works

Most SQL indexes use a B-tree (balanced tree) structure. The database builds a sorted tree of the indexed column values, with each leaf node pointing to the physical location of the matching row.

B-tree index on orders.customer_id
[500]
/ \
[250] [750]
/ \ / \
[100][300][600][900]
| | | |
rows rows rows rows (physical row pointers)

When you query WHERE customer_id = 300, the database:

  1. Starts at the root (500) — 300 < 500, go left
  2. Reaches 250 — 300 > 250, go right
  3. Reaches 300 — found it; follow pointer to data rows

This takes O(log n) time instead of O(n) for a full table scan. On a table with 10 million rows, that’s the difference between checking ~24 nodes vs. 10 million rows.


Creating Indexes

-- Basic index on a single column
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Unique index (also enforces uniqueness constraint)
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Composite index (order matters)
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- Descending index (useful for ORDER BY DESC queries)
CREATE INDEX idx_orders_date_desc ON orders(order_date DESC);
-- Index on expression
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Partial index (only index rows matching a condition)
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';
-- PostgreSQL: concurrent index build (doesn't lock the table)
CREATE INDEX CONCURRENTLY idx_orders_amount ON orders(amount);

Index Types

B-tree (default) — Best for equality (=) and range comparisons (>, <, BETWEEN). Supports ORDER BY optimization. Used in the vast majority of cases.

Hash — Only supports equality (=). Slightly faster than B-tree for pure equality lookups, but can’t support range queries or sorting. PostgreSQL supports hash indexes (and they’re crash-safe since PostgreSQL 10).

GiST — Generalized Search Tree. Supports geometric types, full-text search, and range types in PostgreSQL.

GIN — Generalized Inverted Index. Optimal for indexing array values or JSONB columns where you need to search inside composite values.

BRIN — Block Range INdex. Very small, best for naturally ordered data like timestamps on append-only tables.

-- GIN index for JSONB queries
CREATE INDEX idx_events_data ON events USING GIN(event_data);
-- GIN index for full-text search
CREATE INDEX idx_articles_search ON articles USING GIN(to_tsvector('english', content));
-- BRIN for time-series data
CREATE INDEX idx_logs_created ON logs USING BRIN(created_at);

Composite Index Column Order

For a composite index like (customer_id, order_date), the order of columns determines which queries benefit:

-- Index on (customer_id, order_date)
-- USES the index fully:
WHERE customer_id = 101 AND order_date >= '2025-01-01'
WHERE customer_id = 101
-- USES the index partially (customer_id only):
WHERE customer_id = 101
ORDER BY order_date -- can still use index for this
-- DOES NOT use the index:
WHERE order_date >= '2025-01-01' -- skipping the first column

The leftmost prefix rule: a composite index can be used for any query that filters on a prefix of the indexed columns. Always put the most selective column (the one with the most distinct values) first when you have a choice.


When Queries Don’t Use Indexes

Several patterns prevent index usage:

-- Functions on indexed column (prevents index use)
WHERE YEAR(order_date) = 2025 -- bad
WHERE order_date >= '2025-01-01' -- good
WHERE LOWER(email) = 'user@example.com' -- bad (unless expression index exists)
WHERE email = 'user@example.com' -- good (assumes email stored as lowercase)
-- Type mismatch (implicit cast can block index use)
WHERE customer_id = '12345' -- customer_id is INTEGER; string literal may not match
-- Leading wildcard (can't use B-tree index)
WHERE name LIKE '%smith%' -- bad
WHERE name LIKE 'smith%' -- good (leading literal)
-- OR conditions (complex; may or may not use index)
WHERE status = 'active' OR customer_id = 101
-- Better as UNION:
SELECT * FROM orders WHERE status = 'active'
UNION
SELECT * FROM orders WHERE customer_id = 101;

Identifying Missing Indexes

EXPLAIN ANALYZE shows whether an index is being used:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 101 AND status = 'completed';

Look for:

Find tables with sequential scans (PostgreSQL):

SELECT
relname AS table_name,
seq_scan,
idx_scan,
n_live_tup AS rows
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
AND n_live_tup > 10000
ORDER BY seq_scan DESC;

Too Many Indexes: The Cost Side

Every index:

For read-heavy tables (analytics, data warehouses), many indexes are fine. For write-heavy tables (transactional systems, event logs), too many indexes degrade write throughput significantly.

Remove unused indexes (PostgreSQL):

SELECT
schemaname,
tablename,
indexname,
idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE 'pg_%'
ORDER BY tablename;

Zero scans since the last statistics reset means the index probably isn’t helping.


Covering Indexes

A covering index includes all columns the query needs — the database can satisfy the query entirely from the index without reading the main table.

-- Query only needs these three columns
SELECT customer_id, order_date, amount FROM orders WHERE status = 'completed';
-- Covering index that includes all three
CREATE INDEX idx_orders_covering
ON orders(status) INCLUDE (customer_id, order_date, amount);
-- PostgreSQL 11+ INCLUDE syntax adds non-key columns to the leaf pages

Covering indexes are powerful for frequently-run reporting queries — they eliminate the “heap fetch” step entirely.


Index Maintenance

In PostgreSQL, rows aren’t deleted from indexes immediately — they’re marked as dead and cleaned up by autovacuum. Heavily updated tables can develop bloated, inefficient indexes.

-- Rebuild an index (locks the table)
REINDEX INDEX idx_orders_customer;
-- Rebuild without locking (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_orders_customer;
-- Check index bloat
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Indexes are one of the highest-impact tuning levers in SQL. A well-placed index on a frequently-queried column can turn a 30-second query into a millisecond response. The key is placing them thoughtfully — adding indexes where queries actually filter, join, or sort — and removing them where they’re just adding write overhead without helping reads.