🍃 Advanced MongoDB Interview Questions
This lesson targets mid-to-senior roles. Topics include advanced aggregation stages, index strategies, ACID transactions, change streams, text search, geospatial queries, the explain plan, connection pooling, and schema design patterns. These questions separate developers who query MongoDB from those who architect with it.
Questions & Answers
01 What are compound indexes and how do you design them correctly? ►
Indexes A compound index covers multiple fields in a single index structure. Field order matters significantly โ it determines which queries the index can serve.
// Create a compound index
db.orders.createIndex({ customerId: 1, status: 1, createdAt: -1 });
// This index SUPPORTS queries that use:
db.orders.find({ customerId: "c1" }); // โ
prefix
db.orders.find({ customerId: "c1", status: "pending" }); // โ
prefix
db.orders.find({ customerId: "c1", status: "pending" }).sort({ createdAt: -1 }); // โ
full
db.orders.find({ status: "pending" }); // โ no prefix โ index scan not used
db.orders.find({ createdAt: { $gt: yesterday } }); // โ no prefix
The ESR rule โ recommended field order for compound indexes:
- Equality fields first (fields tested with exact match)
- Sort fields next (fields used in sort())
- Range fields last (fields used with $gt, $lt, $in)
A compound index also serves as an index on the leading prefix fields โ { a:1, b:1, c:1 } covers queries on a, a+b, and a+b+c.
02 What is a covered query in MongoDB? ►
Indexes A covered query is one where all fields in the query filter, sort, and projection are contained within a single index. MongoDB can satisfy it entirely from the index without reading any documents โ the fastest possible query execution.
// Index
db.users.createIndex({ email: 1, name: 1, age: 1 });
// Covered query โ filter and projection both use only indexed fields
db.users.find(
{ email: "alice@example.com" },
{ name: 1, age: 1, _id: 0 } // _id must be excluded โ it's not in the index
).explain("executionStats");
// Look for: "stage": "PROJECTION_COVERED" and totalDocsExamined: 0
Why it matters: Without a covered query, MongoDB reads the index, finds matching _id values, then fetches each document from disk. A covered query skips the document fetch entirely โ dramatically reducing I/O for read-heavy workloads.
To verify coverage, run .explain("executionStats") and check that totalDocsExamined is 0 and stage is PROJECTION_COVERED or IXSCAN.
03 What are ACID transactions in MongoDB? How do you use them? ►
Transactions MongoDB 4.0+ supports multi-document ACID transactions across replica sets. MongoDB 4.2+ added support across sharded clusters. ACID means Atomic, Consistent, Isolated, Durable.
// Using transactions with the Node.js driver
const session = client.startSession();
session.startTransaction({
readConcern: { level: "snapshot" },
writeConcern: { w: "majority" }
});
try {
// Debit account A
await db.collection("accounts").updateOne(
{ _id: "accountA" },
{ $inc: { balance: -100 } },
{ session }
);
// Credit account B
await db.collection("accounts").updateOne(
{ _id: "accountB" },
{ $inc: { balance: 100 } },
{ session }
);
await session.commitTransaction();
} catch (err) {
await session.abortTransaction(); // both changes are rolled back
throw err;
} finally {
await session.endSession();
}
Important: Transactions in MongoDB have a 60-second default timeout and come with a performance overhead. For most use cases, single-document atomicity (built-in) is sufficient. Reserve multi-document transactions for operations that genuinely require atomic changes across multiple documents or collections (e.g., financial transfers, inventory deductions).
04 What are Change Streams in MongoDB? ►
Realtime Change Streams allow applications to subscribe to real-time notifications of data changes in a collection, database, or entire deployment. They are built on MongoDB’s oplog and require a replica set or sharded cluster.
// Watch a collection for any changes
const changeStream = db.collection("orders").watch();
changeStream.on("change", (change) => {
console.log("Change detected:", change.operationType, change.fullDocument);
});
// Watch for specific operations only
const pipeline = [{ $match: { operationType: { $in: ["insert", "update"] } } }];
const stream = db.collection("orders").watch(pipeline, {
fullDocument: "updateLookup" // include updated document on updates
});
// Resume after disconnect using resume token
const stream = db.collection("orders").watch([], {
resumeAfter: lastResumeToken // no events missed on reconnect
});
Use cases: real-time dashboards, cache invalidation, event-driven microservices, audit logs, notifications (send an email when an order is placed), Elasticsearch synchronisation.
Change Streams guarantee ordering and delivery โ use resumeAfter to handle reconnections without missing events.
05 What are advanced aggregation stages: $lookup, $unwind, $facet? ►
Aggregation
$lookup โ left outer join between collections:
db.orders.aggregate([
{ $lookup: {
from: "products", // join with products collection
localField: "productId", // field in orders
foreignField: "_id", // field in products
as: "productDetails" // output array field
}},
{ $unwind: "$productDetails" } // flatten the array to single object
]);
$unwind โ deconstructs an array field, producing one document per array element:
// { tags: ["tech", "news", "sports"] } becomes 3 documents
db.posts.aggregate([{ $unwind: "$tags" }]);
// Use preserveNullAndEmptyArrays: true to keep docs with missing/empty arrays
$facet โ runs multiple aggregation pipelines in parallel on the same input documents (for faceted search / multi-dimensional analytics):
db.products.aggregate([
{ $facet: {
priceRanges: [{ $bucket: { groupBy: "$price", boundaries: [0,10,50,100] }}],
byCategory: [{ $group: { _id: "$category", count: { $sum: 1 } }}],
totalCount: [{ $count: "count" }]
}}
]);
06 What is the explain() method and how do you use it to optimise queries? ►
Performance explain() shows the query execution plan โ how MongoDB found the matching documents, which indexes were used, and how many documents were examined.
db.users.find({ email: "alice@example.com" }).explain("executionStats");
Key fields to check in the output:
- winningPlan.stage โ
IXSCAN(index used โ ) vsCOLLSCAN(full scan โ) - totalDocsExamined โ should be close to
nReturned; a high ratio means the index is not selective - totalKeysExamined โ number of index keys scanned
- executionTimeMillis โ how long the query took
- indexName โ which index was chosen
// Three verbosity levels
.explain() // "queryPlanner" โ shows the plan, no execution
.explain("executionStats") // runs query and shows stats (most useful)
.explain("allPlansExecution") // shows stats for all candidate plans
A COLLSCAN on a large collection is a red flag. Create an index on the query filter field and re-run explain() to verify the plan changed to IXSCAN.
07 What is Full-Text Search in MongoDB? How do you create a text index? ►
Search MongoDB supports full-text search on string fields using a text index. It tokenises string content, removes stop words, and applies stemming for natural language searching.
// Create a text index on one or more string fields
db.articles.createIndex({ title: "text", body: "text", tags: "text" });
// Or on all string fields in the document
db.articles.createIndex({ "$**": "text" });
// Text search using $text
db.articles.find({ $text: { $search: "mongodb aggregation" } });
// Phrase search (exact)
db.articles.find({ $text: { $search: "\"aggregation pipeline\"" } });
// Exclude a word
db.articles.find({ $text: { $search: "mongodb -sql" } });
// Sort by text relevance score
db.articles.find(
{ $text: { $search: "mongodb performance" } },
{ score: { $meta: "textScore" } } // project the score
).sort({ score: { $meta: "textScore" } }); // sort by relevance
Limitation: A collection can have only one text index. For production full-text search, MongoDB Atlas Search (Lucene-powered) is more powerful โ supports fuzzy matching, autocomplete, facets, highlighting, and custom scoring.
08 What is Geospatial querying in MongoDB? ►
Geospatial MongoDB supports geospatial queries for finding documents based on geographic location, using GeoJSON format and 2dsphere indexes.
// Store a location as GeoJSON Point
db.restaurants.insertOne({
name: "The Steakhouse",
location: { type: "Point", coordinates: [-0.1276, 51.5074] } // [lng, lat]
});
// Create a 2dsphere index (required for geospatial queries)
db.restaurants.createIndex({ location: "2dsphere" });
// Find restaurants within 1km of a point
db.restaurants.find({
location: {
$near: {
$geometry: { type: "Point", coordinates: [-0.1276, 51.5074] },
$maxDistance: 1000 // metres
}
}
});
// Find restaurants within a polygon (geofencing)
db.restaurants.find({
location: {
$geoWithin: {
$geometry: {
type: "Polygon",
coordinates: [[ [lng1,lat1],[lng2,lat2],[lng3,lat3],[lng1,lat1] ]]
}
}
}
});
09 What are sparse and partial indexes in MongoDB? ►
Indexes
Sparse index โ only indexes documents that contain the indexed field (ignores documents where the field is absent). Useful when a field only exists in a subset of documents.
// Only index documents that have an "email" field
db.users.createIndex({ email: 1 }, { sparse: true });
// A null or missing email document is NOT in this index
Partial index โ more powerful than sparse. Only indexes documents that meet a specified filter expression. Smaller, more efficient index.
// Index only active products with price > 0
db.products.createIndex(
{ price: 1, name: 1 },
{ partialFilterExpression: { status: "active", price: { $gt: 0 } } }
);
// Queries MUST include the partial filter fields to use this index
Sparse vs Partial: Sparse is a simple “field exists” filter. Partial lets you use any query expression. Partial indexes are preferred in modern MongoDB as they are more explicit and flexible. Both reduce index size and memory footprint when only a fraction of documents need to be indexed.
10 What are common MongoDB schema design patterns? ►
Schema Design MongoDB’s flexible document model enables several design patterns that solve common data modelling challenges:
- Bucket pattern โ group time-series data into buckets (e.g., one document per hour of IoT readings) to avoid unbounded arrays and reduce document count. Used by MongoDB’s Time Series collections.
- Outlier pattern โ most documents share a standard structure, but rare outliers have extra data stored in a separate overflow collection (e.g., a book with millions of reviews vs typical books with a few hundred).
- Computed pattern โ pre-compute and store expensive calculations (totals, averages) in the document on write to make reads fast. Avoids recalculation on every read.
- Subset pattern โ embed only the most-accessed subset of a large array (e.g., 10 latest reviews), store the full list in a separate collection. Reduces working set size.
- Extended Reference pattern โ embed a frequently-used subset of a referenced document (e.g., store
{ userId, userName, userAvatar }in an order) to avoid lookups for common reads. - Attribute pattern โ for documents with many optional fields (product characteristics), store them as an array of
{ key, value }pairs to enable indexing across all attributes.
11 What is connection pooling in MongoDB and how do you configure it? ►
Performance A connection pool is a cache of database connections maintained by the driver so that connections can be reused across requests. Creating a new TCP connection to MongoDB on every operation is expensive โ pooling eliminates this overhead.
// Node.js โ configure the pool in the connection string or options
const client = new MongoClient(uri, {
maxPoolSize: 50, // max concurrent connections (default: 100)
minPoolSize: 10, // keep 10 connections always alive
maxIdleTimeMS: 30000, // close idle connections after 30s
waitQueueTimeoutMS: 5000 // throw error if no connection available in 5s
});
// IMPORTANT: Create the client ONCE and reuse it across your app
// Do NOT create a new MongoClient per request โ this defeats the pool
await client.connect();
export const db = client.db("myDatabase"); // shared across modules
Sizing the pool: A pool that’s too small creates a queue of waiting requests. Too large wastes memory and can overload the server. A good starting point is maxPoolSize = (number of CPUs ร 2) + number of disks. Monitor using Atlas metrics or db.serverStatus().connections.
12 What is the WiredTiger storage engine? ►
Internals WiredTiger is MongoDB’s default storage engine since version 3.2. It replaced the older MMAPv1 engine and provides significant performance and concurrency improvements.
Key features of WiredTiger:
- Document-level concurrency โ uses optimistic concurrency control (not collection-level locks like MMAPv1), allowing many writers to operate simultaneously on different documents
- Compression โ uses Snappy compression by default (can be changed to zlib, zstd) โ typically 60-80% storage savings over uncompressed data
- Checkpoints โ writes a consistent snapshot to disk every 60 seconds (or 2GB of writes), ensuring data durability
- Journal โ logs all operations between checkpoints; used to recover data after a crash
- Cache โ configurable internal cache (default: 50% of RAM minus 1GB); keeps hot data in memory for fast access
// Check and configure WiredTiger cache size (mongod.conf) // storage: // wiredTiger: // engineConfig: // cacheSizeGB: 4
13 How do you handle many-to-many relationships in MongoDB? ►
Schema Design Many-to-many relationships (e.g., students & courses, products & tags) can be modelled several ways in MongoDB depending on access patterns.
Array of references (most common):
// students collection
{ _id: ObjectId("s1"), name: "Alice", enrolledCourseIds: [ObjectId("c1"), ObjectId("c2")] }
// courses collection
{ _id: ObjectId("c1"), title: "MongoDB Basics", studentIds: [ObjectId("s1"), ObjectId("s3")] }
// Query: all courses for a student
db.courses.find({ _id: { $in: student.enrolledCourseIds } });
Junction collection (for rich relationship data):
// enrollments collection โ stores the relationship + extra data
{
studentId: ObjectId("s1"),
courseId: ObjectId("c1"),
enrolledAt: ISODate("2026-01-10"),
grade: "A",
completedAt: null
}
Use a junction collection when the relationship itself has attributes (grade, enrolment date). Use array references when the relationship is simple and access patterns are known. Embedding is only practical for one-to-few relationships, not many-to-many.
14 What is the aggregation $bucket and $bucketAuto stage? ►
Aggregation $bucket and $bucketAuto categorise documents into ranges (buckets) โ similar to a histogram.
// $bucket โ manually define bucket boundaries
db.products.aggregate([
{ $bucket: {
groupBy: "$price",
boundaries: [0, 10, 25, 50, 100, 500], // buckets: 0-10, 10-25, 25-50...
default: "500+", // documents outside boundaries
output: {
count: { $sum: 1 },
avgPrice: { $avg: "$price" },
products: { $push: "$name" }
}
}}
]);
// Output: [{ _id: 0, count: 12, avgPrice: 7.5 }, { _id: 10, count: 8 }, ...]
// $bucketAuto โ let MongoDB choose N evenly distributed buckets
db.products.aggregate([
{ $bucketAuto: {
groupBy: "$price",
buckets: 5, // create 5 automatic buckets
granularity: "R5" // optional: use a standard numerical series
}}
]);
These stages are ideal for analytics dashboards showing price distributions, age ranges, or response time histograms.
15 What is MongoDB’s Read Concern? How does it relate to data consistency? ►
Consistency Read concern controls the consistency and isolation of data returned by a read operation in a replica set or sharded cluster.
- local (default) โ returns the most recent data from the queried node. May include data that could be rolled back if the primary fails before replication completes.
- available โ same as local for replica sets, but for sharded clusters returns data without checking if it’s been orphaned.
- majority โ returns only data that has been acknowledged by a majority of replica set members. Guarantees data will not be rolled back. Slightly higher latency.
- snapshot โ reads from a snapshot of data at a consistent point in time. Used inside multi-document transactions for full isolation.
- linearizable โ guarantees you read the most up-to-date data that reflects all successful prior writes. Only for single-document reads. Slowest.
db.orders.find({ status: "confirmed" }).readConcern("majority");
For most applications, the default local read concern is acceptable. Use majority for financial or critical data where reading stale rolled-back data is unacceptable.
16 What is the Mongoose ODM? How does it differ from the native MongoDB driver? ►
Tools Mongoose is an Object Document Mapper (ODM) for MongoDB and Node.js. It adds an abstraction layer on top of the native MongoDB driver, providing schema definition, validation, middleware (hooks), and model methods.
// Mongoose โ schema + model + validation built in
const userSchema = new Schema({
name: { type: String, required: true, minlength: 2 },
email: { type: String, required: true, unique: true, lowercase: true },
createdAt: { type: Date, default: Date.now }
});
// Middleware (hooks)
userSchema.pre("save", async function() {
if (this.isModified("password")) this.password = await bcrypt.hash(this.password, 10);
});
const User = mongoose.model("User", userSchema);
await User.create({ name: "Alice", email: "alice@example.com" });
Native driver vs Mongoose:
- Native driver โ maximum flexibility and performance, no overhead, direct MongoDB API access. Use for performance-critical services or when you don’t want schema enforcement.
- Mongoose โ schema validation, virtual fields, population (references), middleware, cleaner API for CRUD. Best for applications that benefit from structured data models and built-in validation. Slight performance overhead.
17 How do you perform bulk write operations efficiently in MongoDB? ►
Performance Sending individual write operations in a loop makes one network round-trip per operation. bulkWrite() batches multiple operations into a single network request โ dramatically improving throughput.
await db.collection("products").bulkWrite([
{ insertOne: { document: { name: "Widget A", price: 9.99 } } },
{ updateOne: { filter: { _id: id1 }, update: { $inc: { stock: -5 } } } },
{ updateMany: { filter: { category: "sale" }, update: { $mul: { price: 0.9 } } } },
{ replaceOne: { filter: { _id: id2 }, replacement: { name: "New Widget", price: 12 } } },
{ deleteOne: { filter: { _id: id3 } } }
], {
ordered: false // continue processing even if one operation fails (faster)
// ordered: true = stop on first error (default, safer)
});
insertMany vs bulkWrite: insertMany is only for inserts. bulkWrite supports mixed operations. For pure inserts, insertMany is slightly more efficient. For mixed insert/update/delete workloads, bulkWrite is the right tool.
MongoDB processes bulk writes in batches of up to 100,000 operations internally.
18 What is MongoDB’s $lookup with a pipeline (advanced join)? ►
Aggregation MongoDB 3.6+ added a more powerful form of $lookup that allows you to run a full aggregation pipeline on the joined collection โ enabling complex conditions, correlated subqueries, and filtered joins that the basic $lookup doesn’t support.
db.orders.aggregate([
{
$lookup: {
from: "orderItems",
let: { orderId: "$_id" }, // variables from the outer doc
pipeline: [
{ $match: { $expr: { $and: [
{ $eq: ["$orderId", "$$orderId"] },
{ $gt: ["$quantity", 0] } // filter in the joined collection
]}}},
{ $project: { productId: 1, quantity: 1, price: 1 } },
{ $sort: { price: -1 } },
{ $limit: 5 } // only top 5 items
],
as: "topItems"
}
}
]);
This is equivalent to a correlated subquery in SQL. Use it for: joining with conditions beyond equality, joining with sorting/limiting on the joined side, computing aggregates on the joined data before merging.
19 What are Time Series collections in MongoDB 5.0+? ►
Specialised Time Series collections are optimised specifically for storing and querying sequences of measurements over time (IoT sensor data, stock prices, server metrics). They provide automatic bucketing, compression, and query optimisation for time-ordered data.
// Create a time series collection
db.createCollection("sensorReadings", {
timeseries: {
timeField: "timestamp", // required: the date/time field
metaField: "sensorId", // optional: metadata for bucketing
granularity: "seconds" // "seconds" | "minutes" | "hours"
},
expireAfterSeconds: 2592000 // auto-delete data older than 30 days
});
// Insert readings normally
db.sensorReadings.insertMany([
{ timestamp: new Date(), sensorId: "sensor-001", temperature: 22.5, humidity: 60 },
{ timestamp: new Date(), sensorId: "sensor-002", temperature: 19.1, humidity: 72 }
]);
// Aggregate time-bucketed averages
db.sensorReadings.aggregate([
{ $match: { sensorId: "sensor-001" } },
{ $group: { _id: { $dateTrunc: { date: "$timestamp", unit: "hour" } },
avgTemp: { $avg: "$temperature" } }},
{ $sort: { _id: 1 } }
]);
Time Series collections use columnar compression internally โ typically 60-80% smaller than equivalent regular collections.
20 What is MongoDB’s Aggregation $merge and $out stage? ►
Aggregation Both stages write aggregation results to a collection, enabling materialised views and ETL pipelines.
$out โ replaces the entire target collection with the pipeline results. All existing documents in the target are removed.
// Create a daily sales summary report collection
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" } },
totalRevenue: { $sum: "$amount" }, orderCount: { $sum: 1 } }},
{ $out: "dailySalesSummary" } // REPLACES the collection
]);
$merge (MongoDB 4.2+) โ more flexible: merges results into an existing collection with configurable behaviour for matched and unmatched documents.
db.orders.aggregate([
// ... pipeline ...
{ $merge: {
into: "dailySalesSummary",
on: "_id", // match key
whenMatched: "merge", // merge fields into existing doc
whenNotMatched: "insert" // insert new docs
}}
]);
// Use $merge for incremental updates (update today's record only)
// Use $out for full replacement (regenerate the entire report)
21 How do you secure a MongoDB deployment? ►
Security
- Enable authentication โ start mongod with
--auth. Create users with minimum necessary privileges (principle of least privilege). Never use the default unauthenticated setup in production. - Network access control โ bind MongoDB to specific IPs (
bindIp). Use a firewall/VPC to prevent public internet access. Only trusted application servers should reach MongoDB’s port (27017). - TLS/SSL encryption โ enable TLS for all client-server and intra-cluster communication to prevent eavesdropping.
- Role-Based Access Control (RBAC) โ assign only needed roles. A read-only reporting service gets
readrole, notreadWrite. Application service accounts should never havedbAdminorclusterAdmin. - Encryption at rest โ use WiredTiger encryption (MongoDB Enterprise) or encrypted storage volumes. Atlas encrypts at rest by default.
- Audit logging โ log authentication, authorisation, DDL, and CRUD operations for compliance and intrusion detection.
- Input validation โ prevent NoSQL injection by sanitising user inputs. Never build query objects from raw user strings; always use parameterised structures.
📝 Knowledge Check
Test your understanding of advanced MongoDB patterns and features.