March 27, 202615 min readRishi

Database Indexing and Query Optimization: A Deep Dive

A slow query in production can bring your entire application to its knees. I have seen a single missing index turn a 2ms query into a 45-second full table scan that cascaded into connection pool exhaustion and a complete outage. Understanding how indexes work is not optional knowledge for backend engineers — it is essential infrastructure literacy.

This post covers indexing from the ground up: how they are physically stored, the data structures behind them, composite index design, query plan analysis, and the trade-offs you need to navigate in production systems.

What Is an Index, Really?

An index is a separate data structure that the database maintains alongside your table data. Think of it like the index at the back of a textbook: instead of reading every page to find a topic, you look up the topic in the index and jump directly to the right page.

Without an index, the database must perform a sequential scan (also called a full table scan) — reading every single row in the table to find matches. With an index, it can navigate a much smaller data structure to locate the relevant rows directly.

-- Without an index: scans all 10 million rows
SELECT * FROM orders WHERE customer_id = 42;

-- After creating an index: navigates a tree structure, touches ~3-4 pages
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

The key insight is that an index trades write performance and storage space for read performance. Every INSERT, UPDATE, and DELETE now needs to maintain the index in addition to the table data.

B-Tree vs B+ Tree

B-Tree Basics

A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, insertions, and deletions in O(log n) time. Every node can hold multiple keys and have multiple children, keeping the tree shallow even for millions of entries.

Key properties of a B-tree:

Every node holds between t-1 and 2t-1 keys (where t is the minimum degree)
All leaves are at the same depth
Internal nodes store both keys and data pointers
The tree stays balanced through splits and merges during insertions and deletions

B+ Tree: The Database Standard

Almost every production relational database (PostgreSQL, MySQL/InnoDB, SQL Server, Oracle) uses B+ trees, not plain B-trees. The critical differences are:

Data only in leaf nodes: Internal nodes contain only keys and child pointers, making them smaller and allowing more keys per page. This means the tree is shallower.
Leaf nodes are linked: Leaf nodes form a doubly linked list, making range scans extremely efficient — once you find the start of a range, you just follow the pointers.
Higher fan-out: Because internal nodes are smaller (no data pointers), each page can hold more keys, reducing the tree height.

B+ Tree Structure:
                    [30 | 70]                 <- Root (keys only)
                   /    |    \
          [10|20]    [45|55]    [80|90]       <- Internal (keys only)
         /  |  \    /  |  \    /  |  \
        [5,8] [10,15] [20,25] [35,40] ...    <- Leaves (keys + data pointers)
          |------>|------->|------->|          <- Linked list for range scans

Why B+ Trees Win for Databases

Consider a database page size of 8KB. With a B-tree, each internal node stores the key plus a pointer to the full row data (say 100 bytes per entry). That gives about 80 keys per page. With a B+ tree, internal nodes only store the key and a child pointer (say 12 bytes per entry), giving about 680 keys per page.

For 100 million rows:

B-tree height: log₈₀(100M) ≈ 4.3 levels
B+ tree height: log₆₈₀(100M) ≈ 2.8 levels

Fewer levels means fewer disk reads. One fewer level of tree traversal can save milliseconds on every single query, and those milliseconds compound rapidly under load.

Hash Indexes

Hash indexes use a hash function to map keys directly to locations. They are conceptually simpler than B+ trees:

hash('customer_42') → bucket 7 → row pointer

When hash indexes work:

Exact equality lookups (WHERE id = 42)
Point queries on high-cardinality columns

When hash indexes fail:

Range queries (WHERE created_at > '2025-01-01') — hash values have no ordering
Sorting (ORDER BY) — same reason
Prefix matching (WHERE name LIKE 'John%')
Any multi-column ordered access

In PostgreSQL, hash indexes have been crash-safe since version 10, but B+ tree indexes are still the default and generally preferred. In MySQL/InnoDB, hash indexes are used internally for the adaptive hash index (a cache of frequently accessed B+ tree pages), but you cannot explicitly create one.

Rule of thumb: Unless you have a specific, measured performance problem with equality lookups on a high-cardinality column, stick with B+ tree indexes.

Composite Indexes and Column Order

Composite (multi-column) indexes are where index design gets interesting and where most mistakes happen. The column order in a composite index is critical.

The Leftmost Prefix Rule

A composite index on (a, b, c) can satisfy queries that filter on:

a alone
a and b
a, b, and c

But it cannot efficiently satisfy queries that filter on:

b alone (must scan the entire index)
c alone
b and c

-- Given this composite index:
CREATE INDEX idx_orders_composite ON orders (status, customer_id, created_at);

-- These queries USE the index efficiently:
SELECT * FROM orders WHERE status = 'shipped';
SELECT * FROM orders WHERE status = 'shipped' AND customer_id = 42;
SELECT * FROM orders WHERE status = 'shipped' AND customer_id = 42
  AND created_at > '2025-01-01';

-- This query CANNOT use the index efficiently:
SELECT * FROM orders WHERE customer_id = 42;
-- The database would need to scan all statuses to find customer_id = 42

-- This query can use the index for status, but not for created_at:
SELECT * FROM orders WHERE status = 'shipped' AND created_at > '2025-01-01';
-- It skips customer_id, breaking the leftmost prefix for created_at

Column Order Strategy

How should you order columns in a composite index? Follow these principles:

Equality columns first: Columns used with = should come before columns used with ranges (>, <, BETWEEN, LIKE)
High selectivity for equality: Among equality columns, place the one that filters out more rows first (though this matters less than the equality-before-range rule)
Range column last: Only one column in a composite index can use a range scan efficiently — it should be the last one

-- Query pattern:
SELECT * FROM orders
WHERE status = 'shipped'
  AND region = 'us-east'
  AND created_at BETWEEN '2025-01-01' AND '2025-03-01';

-- Optimal index (equalities first, range last):
CREATE INDEX idx_orders_optimal ON orders (status, region, created_at);

-- Sub-optimal (range column in the middle breaks it for subsequent columns):
CREATE INDEX idx_orders_bad ON orders (status, created_at, region);

Covering Indexes

A covering index is an index that contains all the columns a query needs, so the database never has to access the actual table data. This eliminates the "table lookup" step entirely.

-- Query that needs only status, customer_id, and total:
SELECT customer_id, total FROM orders WHERE status = 'shipped';

-- Covering index (INCLUDE adds non-key columns to leaf nodes):
CREATE INDEX idx_orders_covering ON orders (status)
  INCLUDE (customer_id, total);

PostgreSQL's INCLUDE clause (available since version 11) adds columns to the leaf nodes without including them in the tree's sort order. This is more space-efficient than putting them in the index key.

In MySQL/InnoDB, all secondary indexes implicitly include the primary key, so the primary key columns are always "covered" without explicitly adding them.

When to use covering indexes:

High-frequency queries that select a small, consistent set of columns
Queries where the table lookup dominates the query cost
Reporting queries that read many rows but few columns

Partial Indexes

A partial index only indexes rows that match a given predicate. This is powerful for tables where queries consistently filter on a specific subset.

-- Only 2% of orders are 'pending', but queries hit them constantly
CREATE INDEX idx_orders_pending ON orders (created_at)
  WHERE status = 'pending';

-- This index is much smaller than indexing all orders
-- and only needs updating when rows enter or leave 'pending' status

Partial indexes are available in PostgreSQL but not in MySQL. They shine in several scenarios:

Soft deletes: Index only non-deleted rows (WHERE deleted_at IS NULL)
Active records: Index only active/pending items in a status workflow
Non-null filtering: Index only rows where an optional column has a value

-- Only index users who have verified their email
CREATE INDEX idx_users_verified ON users (email)
  WHERE email_verified = true;

-- Only index non-deleted products
CREATE INDEX idx_products_active ON products (category_id, price)
  WHERE deleted_at IS NULL;

Index-Only Scans

An index-only scan (called a "covering index scan" in some databases) is the best-case scenario: the database reads everything it needs from the index alone without touching the table.

In PostgreSQL, there is a subtlety: index-only scans require that the visibility map indicates the relevant pages are "all-visible." If a page has recently been modified and not yet vacuumed, PostgreSQL must check the table to verify row visibility, which degrades the index-only scan to a regular index scan.

EXPLAIN (ANALYZE, BUFFERS) SELECT customer_id, total
FROM orders WHERE status = 'shipped';

-- Look for "Index Only Scan" in the output:
-- Index Only Scan using idx_orders_covering on orders
--   Heap Fetches: 0          <- This is the goal
--   Buffers: shared hit=142  <- All reads from index

Practical tip: Run VACUUM regularly (or configure autovacuum properly) to keep the visibility map up to date and ensure index-only scans can actually skip the heap.

Reading EXPLAIN Plans

The EXPLAIN command is your primary tool for understanding query performance. Here is how to read it effectively.

Basic EXPLAIN Output

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

Index Scan using idx_orders_customer_id on orders
  (cost=0.43..8.45 rows=1 width=128)
  (actual time=0.024..0.026 rows=1 loops=1)
  Index Cond: (customer_id = 42)
Planning Time: 0.085 ms
Execution Time: 0.048 ms

Key fields to understand:

cost: Estimated startup cost..total cost (in arbitrary units proportional to seq_page_cost)
rows: Estimated number of rows returned
actual time: Real wall-clock time in milliseconds (only with ANALYZE)
loops: How many times this node was executed
Buffers: (with BUFFERS option) How many pages were read from cache vs. disk

Common Scan Types

Scan Type	What It Means	Performance
Seq Scan	Full table scan, every row read	Slow for large tables
Index Scan	Tree traversal + table lookup	Fast for selective queries
Index Only Scan	Tree traversal, no table access	Fastest
Bitmap Index Scan	Builds a bitmap of matching pages	Good for medium selectivity
Bitmap Heap Scan	Reads pages from bitmap	Often paired with bitmap index scan

Red Flags in EXPLAIN Plans

Watch for these warning signs:

-- 1. Seq Scan on a large table with a WHERE clause
Seq Scan on orders (cost=0.00..458832.00 rows=50 width=128)
  Filter: (customer_id = 42)
  Rows Removed by Filter: 9999950
-- ^ 10 million rows scanned to find 50. Needs an index.

-- 2. Huge gap between estimated and actual rows
Index Scan using idx_orders_status on orders
  (cost=0.43..8.45 rows=1 width=128)
  (actual time=0.024..4502.026 rows=890000 loops=1)
-- ^ Estimated 1 row, got 890,000. Statistics are stale. Run ANALYZE.

-- 3. Nested Loop with high row counts on both sides
Nested Loop (actual time=0.052..15234.571 rows=5000000 loops=1)
-- ^ Consider a Hash Join or Merge Join instead. May need index changes.

Common Anti-Patterns

1. Over-Indexing

Every index costs storage and slows down writes. A table with 15 indexes will have significantly slower INSERT performance because each insert must update all 15 index structures.

-- Audit your unused indexes in PostgreSQL:
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

Guideline: If an index has zero scans over a multi-week period, it is a candidate for removal.

2. Wrong Column Order in Composite Indexes

This is the most common indexing mistake. Developers create an index on (a, b) but their most frequent query filters only on b.

-- Index: (status, customer_id)
-- Frequent query: WHERE customer_id = 42
-- Result: Full index scan (or table scan), index is useless for this query

-- Fix: Either reorder to (customer_id, status) or add a separate index on (customer_id)

3. Indexing Low-Cardinality Columns

An index on a boolean column or a status column with 3 possible values is rarely useful by itself. The database often determines that a sequential scan is faster because the index would match too many rows.

-- Almost useless on its own:
CREATE INDEX idx_orders_status ON orders (is_active);
-- Better: Use it in a composite index or as a partial index condition

-- Much more useful:
CREATE INDEX idx_orders_active_customer ON orders (customer_id)
  WHERE is_active = true;

4. Functions on Indexed Columns

Applying a function to an indexed column prevents the index from being used:

-- This CANNOT use an index on created_at:
SELECT * FROM orders WHERE YEAR(created_at) = 2025;

-- Rewrite to use the index:
SELECT * FROM orders
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';

-- Or create an expression index (PostgreSQL):
CREATE INDEX idx_orders_year ON orders (EXTRACT(YEAR FROM created_at));

5. Implicit Type Conversions

If the column is VARCHAR but you compare it with an integer, the database may cast the column, making the index unusable:

-- Column: phone VARCHAR(20), indexed
-- This may not use the index:
SELECT * FROM users WHERE phone = 5551234567;
-- The database casts phone to integer for comparison

-- Fix: Use the correct type:
SELECT * FROM users WHERE phone = '5551234567';

Write Amplification Trade-Off

Every index you add increases write amplification — the ratio of actual disk writes to logical writes. Consider a table with 5 indexes:

A single INSERT of one row requires:
1. Write the row to the table heap
2. Update index 1
3. Update index 2
4. Update index 3
5. Update index 4
6. Update index 5
7. Write WAL (write-ahead log) entries for all of the above

That single logical write becomes 7+ physical writes. For write-heavy workloads, this overhead is significant.

Measuring the Impact

-- Track index maintenance overhead in PostgreSQL:
SELECT relname,
       n_tup_ins AS inserts,
       n_tup_upd AS updates,
       n_tup_del AS deletes,
       idx_scan AS index_scans
FROM pg_stat_user_tables
WHERE relname = 'orders';

Practical trade-off framework:

Read-heavy tables (analytics, reporting): More indexes are acceptable
Write-heavy tables (logging, events, metrics): Minimize indexes aggressively
Mixed workload: Profile carefully, consider read replicas with different index sets

Practical Guidelines for Index Design

Here is a systematic approach to index design that works in production:

Step 1: Start with Query Patterns

List your top 10-20 queries by frequency and latency. These drive your index design, not theoretical coverage.

Step 2: Design Indexes for Queries, Not Tables

Do not index every column "just in case." Design each index to serve one or more specific query patterns.

Step 3: Follow the ESR Rule

For composite indexes, order columns by:

Equality first (columns in WHERE with =)
Sort next (columns in ORDER BY)
Range last (columns with >, <, BETWEEN, IN with many values)

-- Query: WHERE status = 'active' ORDER BY created_at DESC
--        WHERE amount > 100
-- ESR index:
CREATE INDEX idx_orders_esr ON orders (status, created_at DESC, amount);

Step 4: Validate with EXPLAIN ANALYZE

Never assume an index will be used. Always verify with EXPLAIN ANALYZE using realistic data volumes. The query planner may choose a different strategy.

Step 5: Monitor and Iterate

Set up monitoring for:

Slow query logs (queries above a threshold, e.g., 100ms)
Index usage statistics (identify unused indexes)
Table bloat (triggers reindexing needs)
Cache hit ratios (indicates if your working set fits in memory)

-- PostgreSQL: Check cache hit ratio
SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit) as heap_hit,
  sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit) + sum(heap_blks_read), 0)
    as cache_hit_ratio
FROM pg_statio_user_tables;
-- Target: > 0.99 (99% cache hit ratio)

Summary

Indexing is not about adding indexes everywhere — it is about understanding the trade-offs and designing indexes that match your actual query patterns. The key takeaways:

B+ trees are the default and handle both point queries and range scans
Column order matters in composite indexes — follow the equality, sort, range rule
Covering indexes eliminate table lookups for significant performance gains
Partial indexes reduce index size and maintenance cost for filtered workloads
Every index costs writes — measure the impact before adding more
EXPLAIN ANALYZE is your best friend — use it before and after every index change
Monitor continuously — query patterns change as your application evolves

The best index strategy is one that is continuously validated against real production workload. Do not set it and forget it.

SharePost Share

Keep reading

Jul 13, 20265 min read

Designing Ad Click Aggregation: Exactly-Once Counting at Scale

Billions of clicks, billed to the cent: streaming aggregation with watermarks, dedupe, idempotent sinks, and lambda-style reconciliation.

system-design

Jul 13, 20265 min read

Designing a CDN: Cache Hierarchy, Invalidation, and Request Routing

How a CDN actually works: edge PoPs, origin shields, consistent-hash cache keys, purge fan-out, and the anycast vs DNS routing decision.

system-design

Jul 13, 20265 min read

Designing a Distributed Cache Service: Redis Cluster Internals and Hot Keys

Build the cache, not just use it: slot-based sharding, gossip and failover, eviction under memory pressure, and the hot-key problem that shards can't solve.

Database Indexing and Query Optimization: A Deep Dive

What Is an Index, Really?

B-Tree vs B+ Tree

B-Tree Basics

B+ Tree: The Database Standard

Why B+ Trees Win for Databases

Hash Indexes

Composite Indexes and Column Order

The Leftmost Prefix Rule

Column Order Strategy

Covering Indexes

Partial Indexes

Index-Only Scans

Reading EXPLAIN Plans

Basic EXPLAIN Output

Common Scan Types

Red Flags in EXPLAIN Plans

Common Anti-Patterns

1. Over-Indexing

2. Wrong Column Order in Composite Indexes

3. Indexing Low-Cardinality Columns

4. Functions on Indexed Columns

5. Implicit Type Conversions

Write Amplification Trade-Off

Measuring the Impact

Practical Guidelines for Index Design

Step 1: Start with Query Patterns

Step 2: Design Indexes for Queries, Not Tables

Step 3: Follow the ESR Rule

Step 4: Validate with EXPLAIN ANALYZE

Step 5: Monitor and Iterate

Summary

Keep reading

Designing Ad Click Aggregation: Exactly-Once Counting at Scale

Designing a CDN: Cache Hierarchy, Invalidation, and Request Routing

Designing a Distributed Cache Service: Redis Cluster Internals and Hot Keys

New posts, straight to your inbox

Comments