MongoDB Query Optimisation — Indexes, explain(), and Aggregation Performance

MongoDB query performance is one of the highest-leverage areas for MEAN Stack optimisation. A single missing index can turn a 2ms query into a 2000ms full collection scan. A poorly structured aggregation pipeline can process millions of documents when it could process dozens. Understanding the MongoDB query execution model — how indexes are selected, what explain() tells you, and how to design aggregation pipelines for performance — directly translates to lower API latency and reduced MongoDB CPU usage.

MongoDB Performance Indicators

explain() Field Good Value Bad Value Meaning
stage IXSCAN COLLSCAN Index scan vs full collection scan
nReturned Close to docsExamined docsExamined >> nReturned Selectivity — how many docs examined per doc returned
executionTimeMillis < 50ms > 100ms Total query execution time
indexBound Tight bounds [MinKey, MaxKey] Index not being used effectively
keysExamined Close to nReturned Very high relative to nReturned Index selectivity

Index Types for MEAN Stack

Index Type Use Case Example
Single field Filter or sort by one field { user: 1 }
Compound Filter by multiple fields, or filter + sort { user: 1, createdAt: -1 }
Text Full-text search across string fields { title: 'text', description: 'text' }
Partial Index only documents matching a filter (smaller, faster) { status: 1 }, { partialFilterExpression: { status: { $ne: 'deleted' } } }
TTL Automatically delete documents after a time period { expiresAt: 1 }, { expireAfterSeconds: 0 }
Sparse Index only documents where the field exists { dueDate: 1 }, { sparse: true }
Note: The ESR rule for compound indexes: put Equality fields first, Sort fields second, Range fields last. A query like { user: userId, status: 'pending', createdAt: { $gte: date } } with sort { createdAt: -1 } benefits most from index { user: 1, status: 1, createdAt: -1 } — the equality fields (user, status) narrow the results dramatically before the sort/range field (createdAt) is applied. Wrong index field order can make a compound index nearly useless.
Tip: Enable MongoDB’s slow query log with db.setProfilingLevel(1, { slowms: 100 }). This logs all queries taking longer than 100ms to the system.profile collection. Query it with db.system.profile.find().sort({ ts: -1 }).limit(10) to see the most recent slow operations. In production, use MongoDB Atlas Performance Advisor or the Atlas Query Profiler — they automatically surface slow queries and suggest indexes.
Warning: Every index adds write overhead and storage cost. An index must provide enough read performance benefit to justify its write overhead. Generally, collections with a high read-to-write ratio benefit most from indexes. For write-heavy collections (like logs or events), too many indexes slow down inserts significantly. Audit indexes periodically with db.collection.getIndexes() and remove indexes that are never hit with $indexStats aggregation.

Complete MongoDB Performance Optimisation

// ── explain() — analyse query performance ─────────────────────────────────
// In MongoDB Shell or Compass:
db.tasks.find({ user: ObjectId("..."), status: "pending" })
        .sort({ createdAt: -1 })
        .explain("executionStats");

// Key things to look for in the output:
// stage: "IXSCAN"    ← good (using index)
// stage: "COLLSCAN"  ← bad (full collection scan)
// nReturned: 5       ← returned 5 docs
// docsExamined: 5000 ← examined 5000 docs = bad selectivity (1000:1 ratio)
// executionTimeMillis: 250 ← 250ms is too slow — need better index

// ── Mongoose index definitions ────────────────────────────────────────────
const taskSchema = new mongoose.Schema({
    title:       { type: String, required: true },
    status:      { type: String, enum: ['pending', 'in-progress', 'completed'] },
    priority:    { type: String, enum: ['low', 'medium', 'high'] },
    user:        { type: mongoose.Schema.Types.ObjectId, ref: 'User', required: true },
    dueDate:     Date,
    tags:        [String],
    deletedAt:   Date,
    createdAt:   Date,
    updatedAt:   Date,
});

// ── Core query patterns → correct indexes ────────────────────────────────

// Pattern: get user's tasks, sorted newest first
// db.tasks.find({ user: id, deletedAt: { $exists: false } }).sort({ createdAt: -1 })
taskSchema.index({ user: 1, createdAt: -1 });

// Pattern: filter by status + sort
// db.tasks.find({ user: id, status: 'pending' }).sort({ createdAt: -1 })
taskSchema.index({ user: 1, status: 1, createdAt: -1 });  // ESR order

// Pattern: full-text search
// db.tasks.find({ $text: { $search: "client meeting" } })
taskSchema.index({ title: 'text', description: 'text', tags: 'text' }, {
    weights: { title: 10, tags: 5, description: 1 },  // title matches rank higher
    name:    'task_text_search',
});

// Pattern: overdue tasks (due before now, not completed)
// Partial index — only indexes non-completed tasks with a dueDate
taskSchema.index(
    { dueDate: 1, user: 1 },
    { partialFilterExpression: {
        status:  { $nin: ['completed', 'deleted'] },
        dueDate: { $exists: true },
    }}
);

// TTL index — auto-delete deleted tasks after 30 days
taskSchema.index({ deletedAt: 1 }, { expireAfterSeconds: 30 * 24 * 60 * 60 });

// ── Aggregation pipeline optimisation ────────────────────────────────────

// ❌ Slow — sort before match processes all documents
db.tasks.aggregate([
    { $sort:  { createdAt: -1 } },    // processes ALL tasks
    { $match: { user: userId } },     // then filters — too late!
    { $limit: 10 },
]);

// ✅ Fast — match first reduces dataset, then sort
db.tasks.aggregate([
    { $match: { user: userId, deletedAt: { $exists: false } } },  // index hit
    { $sort:  { createdAt: -1 } },    // sorts reduced dataset
    { $limit: 10 },
]);

// ✅ Dashboard stats — single pipeline with $facet
const stats = await Task.aggregate([
    { $match: { user: userId, deletedAt: { $exists: false } } },
    { $facet: {
        byStatus: [
            { $group: { _id: '$status', count: { $sum: 1 } } },
        ],
        byPriority: [
            { $group: { _id: '$priority', count: { $sum: 1 } } },
        ],
        overdue: [
            { $match: { dueDate: { $lt: new Date() }, status: { $ne: 'completed' } } },
            { $count: 'total' },
        ],
        recentActivity: [
            { $sort: { updatedAt: -1 } },
            { $limit: 5 },
            { $project: { title: 1, status: 1, updatedAt: 1 } },
        ],
    }},
]);

// ── Connection pooling — Mongoose defaults ────────────────────────────────
mongoose.connect(uri, {
    maxPoolSize:      10,   // default 5 — increase for high concurrency
    minPoolSize:       2,   // keep 2 connections warm
    serverSelectionTimeoutMS: 5000,
    socketTimeoutMS:  45000,
    family:            4,   // force IPv4 (avoids DNS lookup delays on some hosts)
});

// ── Projection — only fetch needed fields ────────────────────────────────
// ❌ Fetches all fields including large description field
const tasks = await Task.find({ user: userId }).sort('-createdAt').limit(10);

// ✅ Only fetch list-view fields
const tasks = await Task.find({ user: userId })
    .sort('-createdAt')
    .limit(10)
    .select('title status priority dueDate tags createdAt')  // exclude description
    .lean();   // plain JS object, not Mongoose document — 2-3x faster

// ── Populate vs $lookup — when to use each ────────────────────────────────
// populate — simple, adds an extra query per document type
const task = await Task.findById(id).populate('user', 'name email');
// Result: user is a full User object embedded in task

// $lookup in aggregation — single query, better for bulk operations
const tasks = await Task.aggregate([
    { $match: { _id: ObjectId(id) } },
    { $lookup: {
        from:         'users',
        localField:   'user',
        foreignField: '_id',
        as:           'userInfo',
        pipeline: [{ $project: { name: 1, email: 1 } }],  // only needed fields
    }},
    { $unwind: '$userInfo' },
]);

How It Works

Step 1 — explain() Reveals the Query Execution Plan

MongoDB’s query planner evaluates available indexes and selects the one it estimates will scan the fewest index keys. explain("executionStats") shows the winning plan and its actual execution statistics. The key ratio is docsExamined / nReturned — a ratio of 1:1 means perfect selectivity (every examined document was returned). A ratio of 1000:1 means the query is examining 1000 documents to return 1 — a clear signal that a better index is needed.

Step 2 — Compound Index Field Order Determines Effectiveness

MongoDB uses a compound index in prefix order. Index { user: 1, status: 1, createdAt: -1 } can satisfy queries on { user }, { user, status }, or { user, status, createdAt } — but not on { status } alone. Applying the ESR rule (Equality → Sort → Range) orders fields to maximise the index’s applicability across query patterns. The equality fields narrow the result set earliest, the sort field avoids an in-memory sort, and the range field filters the narrowed set.

Step 3 — $match Before $sort in Aggregation Pipelines Is Critical

An aggregation pipeline processes documents stage by stage. If $sort comes before $match, MongoDB must sort the entire collection before filtering. With 100,000 tasks and a query for one user’s 50 tasks, this sorts 100,000 documents unnecessarily. $match first reduces the dataset to 50 documents, then $sort only sorts those 50. MongoDB’s query optimiser sometimes reorders stages automatically — but explicit $match first is more reliable and documents intent.

Step 4 — .lean() Eliminates Mongoose Document Overhead

Mongoose normally wraps query results in full Mongoose Document objects with getters, setters, validation methods, and the ability to call .save(). This has overhead — creating Mongoose documents is 2–3x slower than plain JavaScript objects. .lean() returns plain JS objects, which is appropriate for read-only operations like API responses. Use .lean() for list queries, dashboards, and any operation where you do not need to call document methods.

Step 5 — $facet Executes Multiple Aggregations in One Query

$facet runs multiple sub-pipelines on the same input documents simultaneously. A dashboard that needs task counts by status, counts by priority, overdue count, and recent activity can be computed in a single database round-trip rather than four separate queries. The sub-pipelines share the initial $match filter result, and their outputs are combined into a single response object.

Common Mistakes

Mistake 1 — Fetching all fields when only a few are needed

❌ Wrong — fetches large description field for list view:

const tasks = await Task.find({ user: userId }).limit(20);
// Each task fetches: title, description (2KB), tags, metadata, ...

✅ Correct — project only needed fields:

const tasks = await Task.find({ user: userId })
    .select('title status priority dueDate createdAt')
    .lean();

Mistake 2 — No indexes on filter + sort fields

❌ Wrong — COLLSCAN on every request:

// No index on {user, createdAt} — full collection scan on every page load!
const tasks = await Task.find({ user: userId }).sort('-createdAt').limit(10);

✅ Correct — compound index matching the query pattern:

taskSchema.index({ user: 1, createdAt: -1 });
// Now: IXSCAN on user + createdAt — O(log n) not O(n)

Mistake 3 — N+1 query problem with populate

❌ Wrong — 101 queries for 100 tasks (1 list + 100 individual user lookups):

const tasks = await Task.find({ user: userId }).populate('assignee');
// Mongoose executes: find tasks (1 query) + find each assignee (100 queries)!

✅ Correct — populate batches lookups, or use $lookup in aggregation:

// Mongoose populate is actually batched (not N+1), but $lookup is more explicit:
Task.aggregate([
    { $match: { user: userId } },
    { $lookup: { from: 'users', localField: 'assignee', foreignField: '_id', as: 'assigneeInfo',
                 pipeline: [{ $project: { name: 1 } }] } },
])

Quick Reference

Task Code
Explain query Task.find(filter).explain('executionStats')
Compound index (ESR) schema.index({ user: 1, status: 1, createdAt: -1 })
Text search index schema.index({ title: 'text', description: 'text' })
Partial index schema.index({ field: 1 }, { partialFilterExpression: { ... } })
TTL index schema.index({ deletedAt: 1 }, { expireAfterSeconds: 2592000 })
Lean queries Task.find(f).lean() — plain JS, 2-3x faster
Project fields .select('title status priority -description')
Match before sort [{ $match: filter }, { $sort: order }] always
Multi-facet dashboard { $facet: { a: [...], b: [...] } }
Slow query log db.setProfilingLevel(1, { slowms: 100 })

🧠 Test Yourself

An aggregation pipeline has { $sort: { createdAt: -1 } } as the first stage, followed by { $match: { user: userId } }. The collection has 500,000 documents. What is the performance problem and fix?