How Indexes Work — B-tree, Hash and Index Types

An index is a separate data structure that PostgreSQL maintains alongside a table, organised to make specific queries faster. Without an index, finding rows that match a WHERE condition requires reading every row in the table — a sequential scan. With an index on the right column, PostgreSQL can jump directly to the matching rows — an index scan. The tradeoff: indexes make reads faster but writes slower (every INSERT, UPDATE, and DELETE must also update all relevant indexes). Understanding how different index types work helps you choose the right index for each query pattern in your FastAPI application.

B-tree Indexes — The Default

-- B-tree (balanced tree) is the default index type
-- Supports: =, <, >, <=, >=, BETWEEN, IN, IS NULL, IS NOT NULL, LIKE 'prefix%'

-- Create a B-tree index (default)
CREATE INDEX idx_posts_status ON posts(status);
CREATE INDEX idx_posts_created_at ON posts(created_at DESC);
CREATE INDEX idx_users_email ON users(email);

-- Unique index (enforces uniqueness AND adds index for fast lookups)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- What queries can use this index:
-- WHERE status = 'published'           ✓ equality
-- WHERE created_at > '2025-01-01'     ✓ range
-- WHERE status IN ('draft', 'published') ✓ in-list
-- WHERE email LIKE 'alice%'           ✓ prefix LIKE
-- WHERE email LIKE '%alice%'          ✗ leading wildcard — index not used!
-- WHERE LOWER(email) = 'alice@...'    ✗ function applied — index not used!
-- ORDER BY created_at DESC             ✓ covers sort, avoids filesort
Note: B-tree indexes store data in a sorted, balanced binary tree structure where every lookup takes O(log n) time — for a table with 1 million rows, a B-tree index finds matching rows in about 20 comparisons instead of 1 million. The tree is kept balanced after every insert and delete, which is why writes are slightly slower with indexes. The practical cost: a table with 5 indexes requires updating 5 data structures per INSERT or UPDATE — typically negligible for read-heavy web applications but significant for bulk import operations.
Tip: Create indexes on columns you filter by in WHERE clauses, sort by in ORDER BY, and join on in JOIN conditions. Foreign key columns are the most commonly forgotten — PostgreSQL automatically indexes primary keys but NOT foreign keys. A query like SELECT * FROM posts WHERE author_id = 5 on a 1M-row posts table will do a full sequential scan without an index on author_id. Always add CREATE INDEX idx_posts_author_id ON posts(author_id); immediately after adding a foreign key column.
Warning: Indexes are not free — every index adds overhead to INSERT, UPDATE, and DELETE operations. A table with 10 indexes on a 100M-row table will have noticeably slower writes than the same table with 3 indexes. Audit your indexes periodically with SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0; — unused indexes (zero scans) are pure overhead. Drop them unless they enforce a constraint.

Index Types Overview

Type Best For Operators Supported
B-tree (default) Equality, range, sorting =, <, >, BETWEEN, IN, LIKE ‘prefix%’
Hash Equality only (fast, compact) = only
GIN Arrays, JSONB, full-text search @>, @<, &&, @@, ?
GiST Geometric data, full-text, ranges &&, @>, <<, |>>
SP-GiST Non-balanced trees (phone numbers, IPs) Same as GiST but for non-uniform data
BRIN Very large tables, naturally ordered data Range operators (compact, fast for inserts)

GIN Indexes — Arrays, JSONB and Full-Text

-- GIN (Generalised Inverted Index) — for composite values
-- Each element is indexed separately

-- Array containment queries
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);
SELECT * FROM posts WHERE tags @> ARRAY['python'];    -- ✓ uses GIN index

-- JSONB queries
CREATE INDEX idx_posts_metadata ON posts USING GIN (metadata);
SELECT * FROM posts WHERE metadata @> '{"difficulty": "beginner"}';   -- ✓ uses GIN

-- Full-text search
CREATE INDEX idx_posts_fts ON posts USING GIN (
    to_tsvector('english', title || ' ' || body)
);
SELECT * FROM posts
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'fastapi & tutorial');

Expression Indexes

-- Index on a function result — useful for case-insensitive queries
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now this query uses the index:
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- Without this index, LOWER(email) = '...' would do a sequential scan
-- because the index on email(col) is built on the original values,
-- not the lowercased values

-- Index for slug generation pattern
CREATE INDEX idx_posts_title_slug ON posts(LOWER(REGEXP_REPLACE(title, '\W+', '-', 'g')));

-- Index for date-based queries
CREATE INDEX idx_posts_published_month ON posts(DATE_TRUNC('month', published_at));
SELECT * FROM posts
WHERE DATE_TRUNC('month', published_at) = '2025-08-01';

Common Mistakes

Mistake 1 — Function call on indexed column prevents index use

❌ Wrong — LOWER() prevents the index on email from being used:

SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
-- index on email not used — sequential scan!

✅ Correct — create an expression index OR use ILIKE:

CREATE INDEX idx_users_email_lower ON users(LOWER(email));   -- expression index
-- OR store email as lowercase always and query with =

Mistake 2 — Missing index on foreign key column

❌ Wrong — FK without index causes sequential scan on large table:

-- posts.author_id REFERENCES users(id) — no index on author_id!
SELECT * FROM posts WHERE author_id = 5;   -- scans all 1M posts!

✅ Correct — always index FK columns:

CREATE INDEX idx_posts_author_id ON posts(author_id);   -- ✓

Mistake 3 — Over-indexing write-heavy tables

❌ Wrong — 12 indexes on a high-write events table:

-- Every INSERT into events must update 12 index structures
-- Bulk imports that should take 5 seconds take 60 seconds!

✅ Correct — audit and drop unused indexes regularly:

SELECT indexname, idx_scan FROM pg_stat_user_indexes
WHERE relname = 'events' ORDER BY idx_scan;
-- Drop indexes with idx_scan = 0 (never used)

Quick Reference

Task SQL
B-tree index CREATE INDEX idx_name ON table(col);
Non-blocking index CREATE INDEX CONCURRENTLY ...
Unique index CREATE UNIQUE INDEX ...
GIN index CREATE INDEX ... USING GIN (col)
Expression index CREATE INDEX ... ON table(LOWER(col))
Drop index DROP INDEX idx_name;
List indexes \d tablename in psql
Unused indexes SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;

🧠 Test Yourself

Your FastAPI endpoint queries WHERE LOWER(email) = LOWER(:email). You have a B-tree index on email. Does this query use the index? If not, what is the fix?