MongoDB Query Optimisation — Indexes, explain(), and Aggregation Performance

📚 MEAN Stack 📂 Chapter 22: Performance and Monitoring 📄 Lesson 22020 Advanced 🕒 March 8, 2024

MongoDB query performance is one of the highest-leverage areas for MEAN Stack optimisation. A single missing index can turn a 2ms query into a 2000ms full collection scan. A poorly structured aggregation pipeline can process millions of documents when it could process dozens. Understanding the MongoDB query execution model — how indexes are selected, what explain() tells you, and how to design aggregation pipelines for performance — directly translates to lower API latency and reduced MongoDB CPU usage.

MongoDB Performance Indicators

explain() Field	Good Value	Bad Value	Meaning
`stage`	`IXSCAN`	`COLLSCAN`	Index scan vs full collection scan
`nReturned`	Close to `docsExamined`	`docsExamined` >> `nReturned`	Selectivity — how many docs examined per doc returned
`executionTimeMillis`	< 50ms	> 100ms	Total query execution time
`indexBound`	Tight bounds	`[MinKey, MaxKey]`	Index not being used effectively
`keysExamined`	Close to `nReturned`	Very high relative to nReturned	Index selectivity

Index Types for MEAN Stack

Index Type	Use Case	Example
Single field	Filter or sort by one field	`{ user: 1 }`
Compound	Filter by multiple fields, or filter + sort	`{ user: 1, createdAt: -1 }`
Text	Full-text search across string fields	`{ title: 'text', description: 'text' }`
Partial	Index only documents matching a filter (smaller, faster)	`{ status: 1 }, { partialFilterExpression: { status: { $ne: 'deleted' } } }`
TTL	Automatically delete documents after a time period	`{ expiresAt: 1 }, { expireAfterSeconds: 0 }`
Sparse	Index only documents where the field exists	`{ dueDate: 1 }, { sparse: true }`

Note: The ESR rule for compound indexes: put Equality fields first, Sort fields second, Range fields last. A query like { user: userId, status: 'pending', createdAt: { $gte: date } } with sort { createdAt: -1 } benefits most from index { user: 1, status: 1, createdAt: -1 } — the equality fields (user, status) narrow the results dramatically before the sort/range field (createdAt) is applied. Wrong index field order can make a compound index nearly useless.

Tip: Enable MongoDB’s slow query log with db.setProfilingLevel(1, { slowms: 100 }). This logs all queries taking longer than 100ms to the system.profile collection. Query it with db.system.profile.find().sort({ ts: -1 }).limit(10) to see the most recent slow operations. In production, use MongoDB Atlas Performance Advisor or the Atlas Query Profiler — they automatically surface slow queries and suggest indexes.

Warning: Every index adds write overhead and storage cost. An index must provide enough read performance benefit to justify its write overhead. Generally, collections with a high read-to-write ratio benefit most from indexes. For write-heavy collections (like logs or events), too many indexes slow down inserts significantly. Audit indexes periodically with db.collection.getIndexes() and remove indexes that are never hit with $indexStats aggregation.

Complete MongoDB Performance Optimisation

// ── explain() — analyse query performance ─────────────────────────────────
// In MongoDB Shell or Compass:
db.tasks.find({ user: ObjectId("..."), status: "pending" })
        .sort({ createdAt: -1 })
        .explain("executionStats");

// Key things to look for in the output:
// stage: "IXSCAN"    ← good (using index)
// stage: "COLLSCAN"  ← bad (full collection scan)
// nReturned: 5       ← returned 5 docs
// docsExamined: 5000 ← examined 5000 docs = bad selectivity (1000:1 ratio)
// executionTimeMillis: 250 ← 250ms is too slow — need better index

// ── Mongoose index definitions ────────────────────────────────────────────
const taskSchema = new mongoose.Schema({
    title:       { type: String, required: true },
    status:      { type: String, enum: ['pending', 'in-progress', 'completed'] },
    priority:    { type: String, enum: ['low', 'medium', 'high'] },
    user:        { type: mongoose.Schema.Types.ObjectId, ref: 'User', required: true },
    dueDate:     Date,
    tags:        [String],
    deletedAt:   Date,
    createdAt:   Date,
    updatedAt:   Date,
});

// ── Core query patterns → correct indexes ────────────────────────────────

// Pattern: get user's tasks, sorted newest first
// db.tasks.find({ user: id, deletedAt: { $exists: false } }).sort({ createdAt: -1 })
taskSchema.index({ user: 1, createdAt: -1 });

// Pattern: filter by status + sort
// db.tasks.find({ user: id, status: 'pending' }).sort({ createdAt: -1 })
taskSchema.index({ user: 1, status: 1, createdAt: -1 });  // ESR order

// Pattern: full-text search
// db.tasks.find({ $text: { $search: "client meeting" } })
taskSchema.index({ title: 'text', description: 'text', tags: 'text' }, {
    weights: { title: 10, tags: 5, description: 1 },  // title matches rank higher
    name:    'task_text_search',
});

// Pattern: overdue tasks (due before now, not completed)
// Partial index — only indexes non-completed tasks with a dueDate
taskSchema.index(
    { dueDate: 1, user: 1 },
    { partialFilterExpression: {
        status:  { $nin: ['completed', 'deleted'] },
        dueDate: { $exists: true },
    }}
);

// TTL index — auto-delete deleted tasks after 30 days
taskSchema.index({ deletedAt: 1 }, { expireAfterSeconds: 30 * 24 * 60 * 60 });

// ── Aggregation pipeline optimisation ────────────────────────────────────

// ❌ Slow — sort before match processes all documents
db.tasks.aggregate([
    { $sort:  { createdAt: -1 } },    // processes ALL tasks
    { $match: { user: userId } },     // then filters — too late!
    { $limit: 10 },
]);

// ✅ Fast — match first reduces dataset, then sort
db.tasks.aggregate([
    { $match: { user: userId, deletedAt: { $exists: false } } },  // index hit
    { $sort:  { createdAt: -1 } },    // sorts reduced dataset
    { $limit: 10 },
]);

// ✅ Dashboard stats — single pipeline with $facet
const stats = await Task.aggregate([
    { $match: { user: userId, deletedAt: { $exists: false } } },
    { $facet: {
        byStatus: [
            { $group: { _id: '$status', count: { $sum: 1 } } },
        ],
        byPriority: [
            { $group: { _id: '$priority', count: { $sum: 1 } } },
        ],
        overdue: [
            { $match: { dueDate: { $lt: new Date() }, status: { $ne: 'completed' } } },
            { $count: 'total' },
        ],
        recentActivity: [
            { $sort: { updatedAt: -1 } },
            { $limit: 5 },
            { $project: { title: 1, status: 1, updatedAt: 1 } },
        ],
    }},
]);

// ── Connection pooling — Mongoose defaults ────────────────────────────────
mongoose.connect(uri, {
    maxPoolSize:      10,   // default 5 — increase for high concurrency
    minPoolSize:       2,   // keep 2 connections warm
    serverSelectionTimeoutMS: 5000,
    socketTimeoutMS:  45000,
    family:            4,   // force IPv4 (avoids DNS lookup delays on some hosts)
});

// ── Projection — only fetch needed fields ────────────────────────────────
// ❌ Fetches all fields including large description field
const tasks = await Task.find({ user: userId }).sort('-createdAt').limit(10);

// ✅ Only fetch list-view fields
const tasks = await Task.find({ user: userId })
    .sort('-createdAt')
    .limit(10)
    .select('title status priority dueDate tags createdAt')  // exclude description
    .lean();   // plain JS object, not Mongoose document — 2-3x faster

// ── Populate vs $lookup — when to use each ────────────────────────────────
// populate — simple, adds an extra query per document type
const task = await Task.findById(id).populate('user', 'name email');
// Result: user is a full User object embedded in task

// $lookup in aggregation — single query, better for bulk operations
const tasks = await Task.aggregate([
    { $match: { _id: ObjectId(id) } },
    { $lookup: {
        from:         'users',
        localField:   'user',
        foreignField: '_id',
        as:           'userInfo',
        pipeline: [{ $project: { name: 1, email: 1 } }],  // only needed fields
    }},
    { $unwind: '$userInfo' },
]);

How It Works

Step 1 — explain() Reveals the Query Execution Plan

MongoDB’s query planner evaluates available indexes and selects the one it estimates will scan the fewest index keys. explain("executionStats") shows the winning plan and its actual execution statistics. The key ratio is docsExamined / nReturned — a ratio of 1:1 means perfect selectivity (every examined document was returned). A ratio of 1000:1 means the query is examining 1000 documents to return 1 — a clear signal that a better index is needed.

Step 2 — Compound Index Field Order Determines Effectiveness

MongoDB uses a compound index in prefix order. Index { user: 1, status: 1, createdAt: -1 } can satisfy queries on { user }, { user, status }, or { user, status, createdAt } — but not on { status } alone. Applying the ESR rule (Equality → Sort → Range) orders fields to maximise the index’s applicability across query patterns. The equality fields narrow the result set earliest, the sort field avoids an in-memory sort, and the range field filters the narrowed set.

Step 3 — $match Before $sort in Aggregation Pipelines Is Critical

An aggregation pipeline processes documents stage by stage. If $sort comes before $match, MongoDB must sort the entire collection before filtering. With 100,000 tasks and a query for one user’s 50 tasks, this sorts 100,000 documents unnecessarily. $match first reduces the dataset to 50 documents, then $sort only sorts those 50. MongoDB’s query optimiser sometimes reorders stages automatically — but explicit $match first is more reliable and documents intent.

Step 4 — .lean() Eliminates Mongoose Document Overhead

Mongoose normally wraps query results in full Mongoose Document objects with getters, setters, validation methods, and the ability to call .save(). This has overhead — creating Mongoose documents is 2–3x slower than plain JavaScript objects. .lean() returns plain JS objects, which is appropriate for read-only operations like API responses. Use .lean() for list queries, dashboards, and any operation where you do not need to call document methods.

$facet runs multiple sub-pipelines on the same input documents simultaneously. A dashboard that needs task counts by status, counts by priority, overdue count, and recent activity can be computed in a single database round-trip rather than four separate queries. The sub-pipelines share the initial $match filter result, and their outputs are combined into a single response object.

Common Mistakes

Mistake 1 — Fetching all fields when only a few are needed

❌ Wrong — fetches large description field for list view:

const tasks = await Task.find({ user: userId }).limit(20);
// Each task fetches: title, description (2KB), tags, metadata, ...

✅ Correct — project only needed fields:

const tasks = await Task.find({ user: userId })
    .select('title status priority dueDate createdAt')
    .lean();

Mistake 2 — No indexes on filter + sort fields

❌ Wrong — COLLSCAN on every request:

// No index on {user, createdAt} — full collection scan on every page load!
const tasks = await Task.find({ user: userId }).sort('-createdAt').limit(10);

✅ Correct — compound index matching the query pattern:

taskSchema.index({ user: 1, createdAt: -1 });
// Now: IXSCAN on user + createdAt — O(log n) not O(n)

Mistake 3 — N+1 query problem with populate

❌ Wrong — 101 queries for 100 tasks (1 list + 100 individual user lookups):

const tasks = await Task.find({ user: userId }).populate('assignee');
// Mongoose executes: find tasks (1 query) + find each assignee (100 queries)!

✅ Correct — populate batches lookups, or use $lookup in aggregation:

// Mongoose populate is actually batched (not N+1), but $lookup is more explicit:
Task.aggregate([
    { $match: { user: userId } },
    { $lookup: { from: 'users', localField: 'assignee', foreignField: '_id', as: 'assigneeInfo',
                 pipeline: [{ $project: { name: 1 } }] } },
])

Quick Reference

Task	Code
Explain query	`Task.find(filter).explain('executionStats')`
Compound index (ESR)	`schema.index({ user: 1, status: 1, createdAt: -1 })`
Text search index	`schema.index({ title: 'text', description: 'text' })`
Partial index	`schema.index({ field: 1 }, { partialFilterExpression: { ... } })`
TTL index	`schema.index({ deletedAt: 1 }, { expireAfterSeconds: 2592000 })`
Lean queries	`Task.find(f).lean()` — plain JS, 2-3x faster
Project fields	`.select('title status priority -description')`
Match before sort	`[{ $match: filter }, { $sort: order }]` always
Multi-facet dashboard	`{ $facet: { a: [...], b: [...] } }`
Slow query log	`db.setProfilingLevel(1, { slowms: 100 })`

MongoDB Query Optimisation — Indexes, explain(), and Aggregation Performance

MongoDB Performance Indicators

Index Types for MEAN Stack

Complete MongoDB Performance Optimisation

How It Works

Step 1 — explain() Reveals the Query Execution Plan

Step 2 — Compound Index Field Order Determines Effectiveness

Step 3 — $match Before $sort in Aggregation Pipelines Is Critical

Step 4 — .lean() Eliminates Mongoose Document Overhead

Step 5 — $facet Executes Multiple Aggregations in One Query

Common Mistakes

Mistake 1 — Fetching all fields when only a few are needed

Mistake 2 — No indexes on filter + sort fields

Mistake 3 — N+1 query problem with populate

Quick Reference

🧠 Test Yourself

MongoDB Performance Indicators #

Index Types for MEAN Stack #

Complete MongoDB Performance Optimisation #

How It Works #

Step 1 — explain() Reveals the Query Execution Plan #

Step 2 — Compound Index Field Order Determines Effectiveness #

Step 3 — $match Before $sort in Aggregation Pipelines Is Critical #

Step 4 — .lean() Eliminates Mongoose Document Overhead #

Step 5 — $facet Executes Multiple Aggregations in One Query #

Common Mistakes #

Mistake 1 — Fetching all fields when only a few are needed #

Mistake 2 — No indexes on filter + sort fields #

Mistake 3 — N+1 query problem with populate #

Quick Reference #

🧠 Test Yourself #

📚 More in this Tutorial Series

MongoDB Performance Indicators

Index Types for MEAN Stack

Complete MongoDB Performance Optimisation

How It Works

Step 1 — explain() Reveals the Query Execution Plan

Step 2 — Compound Index Field Order Determines Effectiveness

Step 3 — $match Before $sort in Aggregation Pipelines Is Critical

Step 4 — .lean() Eliminates Mongoose Document Overhead

Step 5 — $facet Executes Multiple Aggregations in One Query

Common Mistakes

Mistake 1 — Fetching all fields when only a few are needed

Mistake 2 — No indexes on filter + sort fields

Mistake 3 — N+1 query problem with populate

Quick Reference

🧠 Test Yourself