Indexes — Single Field, Compound, Text, and Geospatial

Without indexes, every query MongoDB executes performs a collection scan — it reads every single document in the collection to find matches. On a collection of one million tasks, a query like { userId: '...' } reads all one million documents even if only ten match. An index on the userId field lets MongoDB jump directly to the matching documents in milliseconds. Indexes are the single most impactful performance optimisation you can apply to a MongoDB database — more than hardware upgrades, more than query rewrites. Understanding when to create them, what types exist, and how to verify they are being used is a core backend skill.

Index Types

Index Type Created With Supports
Single Field { userId: 1 } Equality, range, sort on one field
Compound { userId: 1, status: 1, createdAt: -1 } Multi-field queries, covered queries
Multikey { tags: 1 } on array field Array element queries — one index entry per element
Text { title: 'text', description: 'text' } Full-text search with $text and relevance scoring
Geospatial (2dsphere) { location: '2dsphere' } Near, within, intersects queries on GeoJSON
Hashed { _id: 'hashed' } Equality only — used for sharding
Wildcard { '$**': 1 } All fields — flexible schemas
Partial { status: 1 }, { partialFilterExpression: { active: true } } Index only a subset of documents
Sparse { phone: 1 }, { sparse: true } Only documents where field exists
TTL { expiresAt: 1 }, { expireAfterSeconds: 0 } Automatic document expiration
Unique { email: 1 }, { unique: true } Enforces uniqueness — like a DB constraint

The ESR Rule — Compound Index Field Order

Position Letter Field Type Example
1st E — Equality Fields used with exact match userId: 1
2nd S — Sort Fields used in .sort() createdAt: -1
3rd R — Range Fields used with $gt, $lt, $in status: 1
Note: The order of fields in a compound index is critical. An index on { userId: 1, status: 1, createdAt: -1 } can support queries that filter by userId, or by userId + status, or by userId + status + createdAt. But it cannot support a query that filters only by status without also filtering by userId. MongoDB uses the index from left to right — it can use a prefix of the index but not skip fields. The ESR rule (Equality, Sort, Range) gives you the optimal field ordering for most compound indexes.
Tip: A covered query is one where MongoDB can answer the query entirely from the index without reading any documents. This is the fastest possible query — no document I/O at all. Create a covered query by including all projected fields in the index: if you query { userId: 1 } and project { title: 1, status: 1, _id: 0 }, an index on { userId: 1, title: 1, status: 1 } covers the query completely. Look for covered queries in the Explain Plan output: totalDocsExamined: 0.
Warning: Every index consumes disk space and slows down writes — MongoDB must update every index on every insert, update, and delete. Do not create indexes speculatively. Create them for queries you can actually measure as slow using explain('executionStats'). A collection with 20 indexes will have significantly slower write throughput than one with 5 well-chosen indexes. Audit and drop unused indexes regularly using db.collection.aggregate([{ $indexStats: {} }]) to see which indexes have zero usage.

Creating and Managing Indexes

// ── Single field index ────────────────────────────────────────────────────
taskSchema.index({ userId: 1 });                     // ascending
taskSchema.index({ createdAt: -1 });                  // descending — newest first

// ── Unique index ──────────────────────────────────────────────────────────
userSchema.index({ email: 1 }, { unique: true });    // no duplicate emails

// ── Compound index — ESR rule: Equality, Sort, Range ──────────────────────
// Query: { userId, status } sorted by createdAt descending
taskSchema.index({ userId: 1, createdAt: -1 });       // equality + sort
taskSchema.index({ userId: 1, status: 1, createdAt: -1 }); // equality + equality + sort

// ── Partial index — only index active, non-deleted tasks ──────────────────
taskSchema.index(
    { userId: 1, dueDate: 1 },
    { partialFilterExpression: { status: { $ne: 'completed' }, deletedAt: { $exists: false } } }
);
// Much smaller index — only covers the documents that matter

// ── TTL index — auto-delete expired sessions ──────────────────────────────
sessionSchema.index(
    { expiresAt: 1 },
    { expireAfterSeconds: 0 }   // delete when expiresAt date is reached
);

// ── Text index — full-text search ─────────────────────────────────────────
taskSchema.index(
    { title: 'text', description: 'text' },
    { weights: { title: 10, description: 3 }, name: 'task_text_search' }
);

// Using text search:
const tasks = await Task.find(
    { $text: { $search: 'client meeting deadline' } },
    { score: { $meta: 'textScore' } }
).sort({ score: { $meta: 'textScore' } });

// ── Sparse index — only for documents that have the field ─────────────────
userSchema.index({ githubId: 1 }, { sparse: true, unique: true });
// Only indexes users with a githubId — doesn't require all users to have one

// ── Multikey index — array field ──────────────────────────────────────────
taskSchema.index({ tags: 1 });
// Query: { tags: 'urgent' } — uses the multikey index efficiently
// Query: { tags: { $all: ['urgent', 'Q4'] } } — also uses multikey index

// ── Geospatial index ──────────────────────────────────────────────────────
locationSchema.index({ coordinates: '2dsphere' });
// coordinates stored as GeoJSON: { type: 'Point', coordinates: [lng, lat] }

// Find locations within 5km of a point
const nearby = await Location.find({
    coordinates: {
        $near: {
            $geometry:    { type: 'Point', coordinates: [-73.97, 40.77] },
            $maxDistance: 5000,   // metres
        },
    },
});

Analysing Query Performance with explain()

// ── explain() — understand how MongoDB executes a query ───────────────────

// Mongoose explain
const explanation = await Task
    .find({ userId: '64a1f...', status: 'pending' })
    .sort({ createdAt: -1 })
    .explain('executionStats');

console.log(JSON.stringify(explanation, null, 2));

// Key fields to examine:
// winningPlan.inputStage.stage
//   'COLLSCAN' → full collection scan (BAD — needs an index)
//   'IXSCAN'   → index scan (GOOD — using an index)
//   'FETCH'    → fetching documents after index scan (normal)
//   'SORT'     → in-memory sort (may need index on sort field)

// executionStats:
//   nReturned           — documents returned
//   totalDocsExamined   — documents MongoDB had to look at
//   totalKeysExamined   — index entries examined
//   executionTimeMillis — query duration

// A good query: nReturned == totalDocsExamined (ratio 1:1)
// A bad query:  nReturned = 10, totalDocsExamined = 100,000 (ratio 1:10000)

// ── Index usage statistics — find unused indexes ──────────────────────────
const indexStats = await Task.aggregate([{ $indexStats: {} }]);
indexStats.forEach(stat => {
    console.log(`${stat.name}: ${stat.accesses.ops} operations since server restart`);
});
// Indexes with 0 ops are candidates for removal

// ── Check existing indexes ────────────────────────────────────────────────
const indexes = await Task.collection.getIndexes();
console.log(indexes);

How It Works

Step 1 — B-Tree Structure Enables Fast Lookups

Most MongoDB indexes use a B-tree (balanced tree) data structure. The B-tree stores field values in sorted order. To find { userId: 'abc' }, MongoDB navigates the B-tree from the root, making comparisons at each node until it finds the matching leaf. This takes O(log n) time — for 1 million documents, that is about 20 comparisons instead of 1 million. The tree is kept balanced automatically on every write.

Step 2 — The Query Planner Chooses the Best Index

When multiple indexes could satisfy a query, MongoDB’s query planner runs them in parallel in a trial period, counts how many documents each approach examines, and picks the winner. The winning plan is cached for that query shape. You can see the winning and rejected plans in explain('allPlansExecution'). The planner re-evaluates periodically or after a certain number of writes to adapt to changing data distributions.

Step 3 — Compound Indexes Serve Multiple Query Patterns

A single compound index on { userId: 1, status: 1, createdAt: -1 } can efficiently serve these queries: { userId }, { userId, status }, { userId, status, createdAt }, and sort by createdAt when filtering on userId. This is the index prefix property — MongoDB can use any left-anchored prefix of the index. Planning compound indexes thoughtfully means fewer total indexes while serving more query patterns.

A text index tokenises the indexed fields — splitting text into words, removing stopwords (“the”, “a”, “is”), and applying stemming (reducing “running” and “runs” to “run”). When you query { $text: { $search: 'running tasks' } }, MongoDB searches for stemmed forms of each word. Relevance scores are computed based on term frequency and the field weights you assign. This is simpler but less powerful than a dedicated search engine like Elasticsearch.

Step 5 — TTL Indexes Automate Document Expiry

MongoDB runs a background thread every 60 seconds that checks TTL-indexed fields. Documents where the indexed date field is older than expireAfterSeconds are automatically deleted. This is ideal for session tokens, password reset tokens, rate-limit records, and audit logs. Setting expireAfterSeconds: 0 means “delete when the indexed date is reached” — the field value itself is the expiry timestamp.

Real-World Example: Index Strategy for Task Manager

// models/task.model.js — complete index strategy

const taskSchema = new mongoose.Schema({ /* ... fields ... */ }, { timestamps: true });

// 1. Primary access pattern: all tasks for a user, sorted by date
taskSchema.index({ user: 1, createdAt: -1 });

// 2. Filter by user + status (most common list query)
taskSchema.index({ user: 1, status: 1, createdAt: -1 });

// 3. Filter by user + priority
taskSchema.index({ user: 1, priority: 1, createdAt: -1 });

// 4. Overdue tasks: user + dueDate (for due date queries)
taskSchema.index({ user: 1, dueDate: 1 });

// 5. Partial index: only active tasks (excludes completed — smaller, faster)
taskSchema.index(
    { user: 1, createdAt: -1 },
    {
        partialFilterExpression: { status: { $ne: 'completed' } },
        name: 'active_tasks_by_user',
    }
);

// 6. Full-text search
taskSchema.index(
    { title: 'text', description: 'text' },
    { weights: { title: 10, description: 3 } }
);

// 7. Tags multikey (for tag filtering)
taskSchema.index({ tags: 1 });

// models/user.model.js
userSchema.index({ email: 1 }, { unique: true });       // unique email
userSchema.index({ createdAt: -1 });                     // list users by date

// models/session.model.js
sessionSchema.index(
    { expiresAt: 1 },
    { expireAfterSeconds: 0, name: 'session_ttl' }       // auto-delete expired sessions
);
sessionSchema.index({ userId: 1 });                      // find sessions by user
sessionSchema.index({ token: 1 }, { unique: true });     // lookup by token

Common Mistakes

Mistake 1 — Creating indexes on every field “just in case”

❌ Wrong — over-indexed collection has slow writes:

// 15 indexes on a write-heavy collection
taskSchema.index({ title: 1 });
taskSchema.index({ description: 1 });
taskSchema.index({ priority: 1 });
taskSchema.index({ status: 1 });
taskSchema.index({ dueDate: 1 });
// ... 10 more
// Every insert/update must update all 15 B-trees — writes are 3x slower

✅ Correct — only index fields that appear in real, frequent queries:

// Profile real queries first, then create targeted compound indexes
taskSchema.index({ user: 1, status: 1, createdAt: -1 });  // covers the main list query

Mistake 2 — Wrong compound index field order

❌ Wrong — range field first blocks use for equality + sort queries:

taskSchema.index({ createdAt: -1, userId: 1 });
// Query: { userId: 'x' } sorted by createdAt — CANNOT use this index efficiently
// MongoDB must scan all entries sorted by createdAt to find userId matches

✅ Correct — equality field first, sort field last (ESR rule):

taskSchema.index({ userId: 1, createdAt: -1 });
// Query: { userId: 'x' } sorted by createdAt — uses index perfectly

Mistake 3 — Not including all query fields in the index

❌ Wrong — index on userId only, query filters on status too:

taskSchema.index({ userId: 1 });
// Query: { userId: 'x', status: 'pending' }
// MongoDB uses the index to find all tasks for userId, then filters status in memory
// If userId has 10,000 tasks, 10,000 docs fetched just to find 20 pending ones

✅ Correct — include status in the compound index:

taskSchema.index({ userId: 1, status: 1 });
// MongoDB goes directly to userId+status='pending' entries — reads only matching docs

Quick Reference

Index Type Schema Definition Use For
Single field schema.index({ field: 1 }) Frequent single-field queries
Unique schema.index({ email: 1 }, { unique: true }) Enforce uniqueness
Compound (ESR) schema.index({ user: 1, status: 1, date: -1 }) Multi-field queries
Text schema.index({ title: 'text', body: 'text' }) Full-text search
Partial schema.index({ f: 1 }, { partialFilterExpression: {...} }) Subset of documents
TTL schema.index({ expiresAt: 1 }, { expireAfterSeconds: 0 }) Auto-delete by date
Sparse schema.index({ field: 1 }, { sparse: true }) Optional fields
Geospatial schema.index({ loc: '2dsphere' }) Location queries
Explain query Model.find(filter).explain('executionStats') Verify index usage

🧠 Test Yourself

The most common query on the tasks collection is { userId: 'x', status: 'pending' } sorted by createdAt descending. Which compound index best supports this query using the ESR rule?