SQL Indexes: Speed Up Queries Without Changing Your Data

An index is a data structure the database maintains alongside a table to speed up row lookups. Without an index, the database performs a sequential scan — checking every row in the table. With an index on the right column, it can jump directly to matching rows.

The trade-off: indexes consume storage and slow down INSERT, UPDATE, and DELETE (because the index must be updated each time). More indexes isn’t always better.

How a B-Tree Index Works

Most SQL indexes use a B-tree (balanced tree) structure. The database builds a sorted tree of the indexed column values, with each leaf node pointing to the physical location of the matching row.

B-tree index on orders.customer_id

         [500]
        /     \
    [250]     [750]
   /    \    /    \
[100][300][600][900]
  |    |    |    |
 rows rows rows rows (physical row pointers)

When you query WHERE customer_id = 300, the database:

Starts at the root (500) — 300 < 500, go left
Reaches 250 — 300 > 250, go right
Reaches 300 — found it; follow pointer to data rows

This takes O(log n) time instead of O(n) for a full table scan. On a table with 10 million rows, that’s the difference between checking ~24 nodes vs. 10 million rows.

Creating Indexes

-- Basic index on a single column
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Unique index (also enforces uniqueness constraint)
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Composite index (order matters)
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Descending index (useful for ORDER BY DESC queries)
CREATE INDEX idx_orders_date_desc ON orders(order_date DESC);

-- Index on expression
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Partial index (only index rows matching a condition)
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';

-- PostgreSQL: concurrent index build (doesn't lock the table)
CREATE INDEX CONCURRENTLY idx_orders_amount ON orders(amount);

Index Types

B-tree (default) — Best for equality (=) and range comparisons (>, <, BETWEEN). Supports ORDER BY optimization. Used in the vast majority of cases.

Hash — Only supports equality (=). Slightly faster than B-tree for pure equality lookups, but can’t support range queries or sorting. PostgreSQL supports hash indexes (and they’re crash-safe since PostgreSQL 10).

GiST — Generalized Search Tree. Supports geometric types, full-text search, and range types in PostgreSQL.

GIN — Generalized Inverted Index. Optimal for indexing array values or JSONB columns where you need to search inside composite values.

BRIN — Block Range INdex. Very small, best for naturally ordered data like timestamps on append-only tables.

-- GIN index for JSONB queries
CREATE INDEX idx_events_data ON events USING GIN(event_data);

-- GIN index for full-text search
CREATE INDEX idx_articles_search ON articles USING GIN(to_tsvector('english', content));

-- BRIN for time-series data
CREATE INDEX idx_logs_created ON logs USING BRIN(created_at);

Composite Index Column Order

For a composite index like (customer_id, order_date), the order of columns determines which queries benefit:

-- Index on (customer_id, order_date)

-- USES the index fully:
WHERE customer_id = 101 AND order_date >= '2025-01-01'
WHERE customer_id = 101

-- USES the index partially (customer_id only):
WHERE customer_id = 101
ORDER BY order_date  -- can still use index for this

-- DOES NOT use the index:
WHERE order_date >= '2025-01-01'  -- skipping the first column

The leftmost prefix rule: a composite index can be used for any query that filters on a prefix of the indexed columns. Always put the most selective column (the one with the most distinct values) first when you have a choice.

When Queries Don’t Use Indexes

Several patterns prevent index usage:

-- Functions on indexed column (prevents index use)
WHERE YEAR(order_date) = 2025           -- bad
WHERE order_date >= '2025-01-01'         -- good

WHERE LOWER(email) = 'user@example.com' -- bad (unless expression index exists)
WHERE email = 'user@example.com'         -- good (assumes email stored as lowercase)

-- Type mismatch (implicit cast can block index use)
WHERE customer_id = '12345'  -- customer_id is INTEGER; string literal may not match

-- Leading wildcard (can't use B-tree index)
WHERE name LIKE '%smith%'    -- bad
WHERE name LIKE 'smith%'     -- good (leading literal)

-- OR conditions (complex; may or may not use index)
WHERE status = 'active' OR customer_id = 101
-- Better as UNION:
SELECT * FROM orders WHERE status = 'active'
UNION
SELECT * FROM orders WHERE customer_id = 101;

Identifying Missing Indexes

EXPLAIN ANALYZE shows whether an index is being used:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 101 AND status = 'completed';

Look for:

Seq Scan on large tables → likely missing an index
Index Scan → good, using the index
Index Only Scan → best case, data retrieved from index without touching the table
Bitmap Index Scan → medium case, many rows match

Find tables with sequential scans (PostgreSQL):

SELECT
    relname AS table_name,
    seq_scan,
    idx_scan,
    n_live_tup AS rows
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
    AND n_live_tup > 10000
ORDER BY seq_scan DESC;

Too Many Indexes: The Cost Side

Every index:

Uses disk space (B-tree indexes are typically 5–20% of the table’s size)
Slows down writes (each INSERT, UPDATE, DELETE must also update all indexes)
Is maintained by the database’s autovacuum/statistics processes

For read-heavy tables (analytics, data warehouses), many indexes are fine. For write-heavy tables (transactional systems, event logs), too many indexes degrade write throughput significantly.

Remove unused indexes (PostgreSQL):

SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
    AND indexname NOT LIKE 'pg_%'
ORDER BY tablename;

Zero scans since the last statistics reset means the index probably isn’t helping.

Covering Indexes

A covering index includes all columns the query needs — the database can satisfy the query entirely from the index without reading the main table.

-- Query only needs these three columns
SELECT customer_id, order_date, amount FROM orders WHERE status = 'completed';

-- Covering index that includes all three
CREATE INDEX idx_orders_covering
    ON orders(status) INCLUDE (customer_id, order_date, amount);
-- PostgreSQL 11+ INCLUDE syntax adds non-key columns to the leaf pages

Covering indexes are powerful for frequently-run reporting queries — they eliminate the “heap fetch” step entirely.

Index Maintenance

In PostgreSQL, rows aren’t deleted from indexes immediately — they’re marked as dead and cleaned up by autovacuum. Heavily updated tables can develop bloated, inefficient indexes.

-- Rebuild an index (locks the table)
REINDEX INDEX idx_orders_customer;

-- Rebuild without locking (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_orders_customer;

-- Check index bloat
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Indexes are one of the highest-impact tuning levers in SQL. A well-placed index on a frequently-queried column can turn a 30-second query into a millisecond response. The key is placing them thoughtfully — adding indexes where queries actually filter, join, or sort — and removing them where they’re just adding write overhead without helping reads.