Database Indexes Explained: How They Work and When to Add Them
How B-tree indexes speed up queries, why too many indexes slow down writes, how composite indexes work, and the EXPLAIN output patterns that reveal when you need an index.
What an index actually is
A database index is a separate data structure maintained alongside your table that allows the database to find rows matching a condition without reading every row in the table. Without an index, a query like WHERE email = 'user@example.com' forces the database to read every row and check each email value — a sequential scan. With an index on the email column, the database can jump directly to the matching rows.
The cost is that indexes must be kept in sync with the table. Every INSERT, UPDATE, and DELETE that modifies an indexed column must also update the index. Indexes consume disk space and slow down writes in proportion to the number of indexes on a table. The art of indexing is identifying which queries run frequently enough and on large enough tables to justify this cost.
How a B-tree index works
The default index type in PostgreSQL, MySQL, and most SQL databases is a B-tree (balanced tree). A B-tree index organises the indexed column values in a sorted tree structure where each node contains a range of values and pointers to child nodes or to the actual table rows.
To find a row with email = 'user@example.com', the database starts at the root of the B-tree and follows pointers down through the tree — at each level, it can eliminate half the remaining candidates by comparing the target value to the midpoint of the current node's range. On a table with 10 million rows, a B-tree search takes roughly 23 comparisons (log₂ of 10 million) instead of 10 million. This is the fundamental O(log n) vs O(n) speedup.
B-tree indexes support equality lookups (=), range queries (>,<, BETWEEN), and prefix lookups (LIKE 'prefix%'). They cannot efficiently support suffix lookups (LIKE '%suffix') or arbitrary substring searches — those require full-text indexes or other approaches.
Reading EXPLAIN output
The EXPLAIN command (or EXPLAIN ANALYZE to actually run the query) shows the query plan the database chose. Learning to read it is the most direct path to understanding indexing performance.
-- Without index on email EXPLAIN SELECT * FROM users WHERE email = 'user@example.com'; Seq Scan on users (cost=0.00..18334.00 rows=1 width=120) Filter: (email = 'user@example.com') -- After: CREATE INDEX idx_users_email ON users(email); EXPLAIN SELECT * FROM users WHERE email = 'user@example.com'; Index Scan using idx_users_email on users (cost=0.43..8.45 rows=1 width=120) Index Cond: (email = 'user@example.com') -- Key terms to understand: -- Seq Scan = reading every row — usually bad on large tables -- Index Scan = using the index, following row pointers — good -- Index Only Scan = all needed data is in the index itself — best -- Bitmap Index Scan = used for returning many rows — efficient for ranges -- cost=X..Y: estimated cost units (X=startup, Y=total) -- rows=N: estimated number of rows returned -- EXPLAIN ANALYZE shows actual rows and actual time
Composite indexes: column order matters
A composite index covers multiple columns. The order of columns in the index definition is critical — the index is sorted first by the first column, then by the second within each first-column value, and so on. This means a composite index can be used by queries that filter on the leading columns, but not by queries that filter only on non-leading columns.
-- Index on (status, created_at) CREATE INDEX idx_orders_status_created ON orders(status, created_at); -- Can use the index (filters on leading column) SELECT * FROM orders WHERE status = 'pending'; SELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-01-01'; SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at; -- Cannot use the index (created_at is not the leading column) SELECT * FROM orders WHERE created_at > '2026-01-01'; -- This requires a separate index on (created_at) if needed -- The rule: put the most selective, most frequently filtered column first, -- then columns used for ordering or range filtering. -- For a query like: -- WHERE user_id = ? AND status = ? ORDER BY created_at -- The ideal index is (user_id, status, created_at)
Covering indexes: eliminating table lookups
An Index Scan finds matching rows in the index, then follows a pointer to the actual table row to retrieve the full row data. This pointer-following (called a "heap fetch") has a cost. A covering index includes all the columns needed by the query, so the database can return the result directly from the index without ever touching the table. PostgreSQL calls this an Index Only Scan.
-- Query: count active users in a specific plan SELECT COUNT(*) FROM users WHERE plan = 'pro' AND status = 'active'; -- Regular index — must fetch table rows to check status CREATE INDEX idx_users_plan ON users(plan); -- Covering index — both columns are in the index, no table fetch needed CREATE INDEX idx_users_plan_status ON users(plan, status); -- INCLUDE clause (PostgreSQL 11+) — add columns to index without using them for sorting -- Useful when you need to include a column in the result without filtering by it CREATE INDEX idx_users_email_cover ON users(email) INCLUDE (name, plan); -- Allows: SELECT name, plan FROM users WHERE email = ? -- to be served entirely from the index
When NOT to add an index
Small tables. For tables under a few thousand rows, a sequential scan is often faster than an index scan because the entire table fits in memory and sequential reads are cache-friendly. The database's query planner will typically choose the sequential scan anyway.
Low-selectivity columns. An index on a boolean column (true/false) or a status column with only 3-4 values is often useless — if half the rows are status = 'active', the database may prefer a sequential scan even with an index, because it would need to fetch so many rows that the index lookup overhead is not worth it.
Write-heavy tables with many existing indexes. Each additional index slows down INSERT, UPDATE, and DELETE operations. Tables that receive thousands of writes per second need their index count carefully controlled. Profile before adding; remove unused indexes (most databases can report which indexes are never scanned).
Columns used only in middle-of-string searches. LIKE '%substring%'cannot use a standard B-tree index. It requires a full-text search index (GIN in PostgreSQL, FULLTEXT in MySQL) or a generated column with a trigram index (pg_trgm extension in PostgreSQL).
Index maintenance: what to check periodically
Indexes can become bloated over time due to deletes and updates leaving dead rows. PostgreSQL's VACUUM process handles this automatically, but on high-churn tables, indexes may need periodic REINDEX to reclaim space and maintain performance. Check for unused indexes — every database tracks index scan counts, and indexes that are never used should be dropped.
-- PostgreSQL: find unused indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan AS scans_since_restart
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND schemaname NOT IN ('pg_catalog')
ORDER BY tablename;
-- PostgreSQL: find large indexes
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexname::regclass) DESC;
-- MySQL: find unused indexes
SELECT * FROM sys.schema_unused_indexes
WHERE object_schema NOT IN ('mysql', 'performance_schema');