Creating and Managing Indexes — Strategies and Maintenance

Knowing that indexes help is not enough — the choice of which columns to index, in what order for composite indexes, and with what conditions for partial indexes determines whether your queries are fast or still slow. Composite indexes follow strict ordering rules that control which queries benefit from them. Monitoring index usage helps you eliminate dead weight. And running VACUUM and REINDEX regularly keeps indexes healthy and accurate. This lesson covers the practical strategies for a FastAPI application’s index management.

Composite (Multi-Column) Indexes

-- A composite index covers multiple columns in a specific order
-- The order of columns in the index critically affects which queries it helps

-- Index: (status, created_at)
CREATE INDEX idx_posts_status_created ON posts(status, created_at DESC);

-- Queries that CAN use this index:
-- WHERE status = 'published'                         ✓ uses first column
-- WHERE status = 'published' ORDER BY created_at     ✓ uses both columns
-- WHERE status = 'published' AND created_at > '...' ✓ uses both columns

-- Queries that CANNOT use this index:
-- WHERE created_at > '2025-01-01'                   ✗ skips first column
-- ORDER BY created_at DESC                           ✗ first column missing
-- The "leftmost prefix rule": must include columns from the left in order

-- Choose column order: put equality-filtered columns first, range columns last
-- Example: status = 'published' AND created_at > X
-- Index should be: (status, created_at) — equality first, then range
Note: The leftmost prefix rule is the most important rule for composite indexes: a composite index on (A, B, C) can be used for queries on (A), (A, B), or (A, B, C) — but NOT for queries on (B), (C), or (B, C) alone. Think of it like a phone book sorted by last name then first name — you can look up by last name, or by last name + first name, but you cannot efficiently look up by first name alone.
Tip: A composite index can replace multiple single-column indexes. (status, author_id, created_at) serves queries filtering by status alone, status+author_id, or all three. Rather than creating three separate indexes, one composite index may cover all your query patterns. The tradeoff: one larger index vs multiple smaller ones. Use EXPLAIN (covered in the next lesson) to verify which index the planner actually uses for each query.
Warning: Do not create composite indexes speculatively — only create them for query patterns you have actually observed in your application. Every index you add increases write overhead. A good workflow: start with single-column indexes on the most-queried columns, identify slow queries with EXPLAIN ANALYZE, then add composite indexes specifically for those slow query patterns. Premature optimisation with dozens of composite indexes often makes a write-heavy application significantly slower overall.

Partial Indexes

-- Partial index: only index rows matching a condition
-- Smaller index (only indexed rows), faster build and lookup

-- Only index published posts (most queries only care about published)
CREATE INDEX idx_posts_published ON posts(created_at DESC)
    WHERE status = 'published';

-- This index is used for:
SELECT * FROM posts WHERE status = 'published' ORDER BY created_at DESC;

-- But NOT for:
SELECT * FROM posts WHERE status = 'draft' ORDER BY created_at DESC;
-- (Falls back to sequential scan or a different index)

-- Other partial index patterns:

-- Index active users only (soft delete pattern)
CREATE INDEX idx_users_email_active ON users(email)
    WHERE deleted_at IS NULL;

-- Index unprocessed jobs (keeps index small as queue drains)
CREATE INDEX idx_jobs_pending ON jobs(created_at)
    WHERE status = 'pending';

-- Index on a rare condition (avoid indexing the common case)
CREATE INDEX idx_posts_featured ON posts(published_at DESC)
    WHERE is_featured = TRUE;

Index Maintenance

-- ── VACUUM — reclaim dead tuple space ────────────────────────────────────────
-- PostgreSQL never overwrites rows — updates create new versions (MVCC)
-- Old versions (dead tuples) accumulate until VACUUM reclaims them

-- Manual vacuum (normally handled by autovacuum)
VACUUM posts;            -- reclaim dead tuples (non-locking)
VACUUM ANALYZE posts;    -- reclaim + update statistics
VACUUM FULL posts;       -- full rewrite (locks table!) — use rarely

-- ── REINDEX — rebuild corrupted or bloated indexes ────────────────────────────
REINDEX INDEX idx_posts_status;          -- rebuild specific index
REINDEX TABLE posts;                     -- rebuild all indexes on table
REINDEX INDEX CONCURRENTLY idx_posts_status;  -- non-blocking rebuild (PG12+)

-- ── Monitor index usage ───────────────────────────────────────────────────────
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan       AS index_scans,
    idx_tup_read   AS tuples_read,
    idx_tup_fetch  AS tuples_fetched,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;   -- unused indexes (0 scans) are candidates for removal

-- ── Monitor table bloat ───────────────────────────────────────────────────────
SELECT
    relname  AS table,
    n_dead_tup,
    n_live_tup,
    ROUND(n_dead_tup::NUMERIC / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1)
        AS dead_pct
FROM pg_stat_user_tables
ORDER BY dead_pct DESC NULLS LAST;

Covering Indexes (Index-Only Scans)

-- An index-only scan reads data entirely from the index, never touching the table
-- Possible when all needed columns are in the index

-- Query needs: id, title, status, created_at
-- If index covers all four columns, PostgreSQL reads only the index:
CREATE INDEX idx_posts_list ON posts(status, created_at DESC)
    INCLUDE (id, title);    -- INCLUDE adds extra columns to the index leaf pages

-- Now this query is an index-only scan (no table access):
SELECT id, title, status, created_at
FROM posts
WHERE status = 'published'
ORDER BY created_at DESC
LIMIT 10;

-- EXPLAIN will show "Index Only Scan" — typically 3-5x faster than regular index scan
-- Tradeoff: larger index size due to extra columns

Common Mistakes

Mistake 1 — Wrong column order in composite index

❌ Wrong — range column first, blocks use for equality queries:

CREATE INDEX idx_wrong ON posts(created_at, status);
-- WHERE status = 'published' — cannot use this index efficiently!

✅ Correct — equality column first:

CREATE INDEX idx_right ON posts(status, created_at);   -- ✓

Mistake 2 — Never running VACUUM ANALYZE after bulk loads

❌ Wrong — planner uses stale statistics after bulk import:

-- After loading 1M rows, planner still thinks table has 1000 rows!
-- Makes terrible query plan choices (wrong join order, wrong index)

✅ Correct — update statistics after major data changes:

VACUUM ANALYZE posts;   -- ✓ updates row count and column statistics

Mistake 3 — Keeping unused indexes

❌ Wrong — 15 indexes on a table, 8 never used:

-- Unused indexes slow down every INSERT/UPDATE/DELETE on the table

✅ Correct — audit and drop unused indexes:

SELECT indexname FROM pg_stat_user_indexes
WHERE relname = 'posts' AND idx_scan = 0
  AND indexname NOT LIKE '%_pkey';   -- skip primary key
-- Drop the unused ones ✓

Quick Reference

Pattern SQL
Composite index CREATE INDEX ON t(col1, col2) — equality cols first
Partial index CREATE INDEX ON t(col) WHERE condition
Covering index CREATE INDEX ON t(col1) INCLUDE (col2, col3)
Non-blocking build CREATE INDEX CONCURRENTLY ...
Update statistics VACUUM ANALYZE tablename
Find unused indexes SELECT ... FROM pg_stat_user_indexes WHERE idx_scan = 0
Rebuild index REINDEX INDEX CONCURRENTLY idx_name

🧠 Test Yourself

You have a composite index (author_id, status, created_at). Which of the following queries can use this index?