Understanding MongoDB — Documents, Collections and JSON Data

MongoDB is the “M” in MERN and the foundation that stores all of your application’s data. Unlike the relational databases you may have encountered (MySQL, PostgreSQL), MongoDB does not use tables, rows, or columns. Instead it stores data as documents — flexible, JSON-like objects — grouped into collections. Understanding how MongoDB organises and represents data is essential before you write a single line of Mongoose code, because every schema decision you make later flows from how MongoDB works at its core.

Relational vs Document Databases

Concept Relational (MySQL) MongoDB
Storage unit Row in a table Document in a collection
Schema Fixed — defined upfront, all rows match Flexible — documents in the same collection can differ
Data format Columns with specific types JSON-like BSON objects with nested fields and arrays
Relationships Foreign keys and JOINs Embedded documents or $lookup aggregation
Query language SQL MongoDB Query Language (MQL) — JSON-based
Scaling Vertical (bigger server) Horizontal (more servers via sharding)
Note: MongoDB stores data as BSON (Binary JSON) on disk for performance, but you always work with it as plain JSON in your Node.js code. Mongoose handles the BSON conversion transparently — you never need to think about it directly.
Tip: Every MongoDB document automatically gets a unique _id field of type ObjectId if you do not supply one. ObjectId is a 12-byte value that encodes a timestamp, making every ID globally unique and time-ordered. In Mongoose you reference it as a string using .toString() or let Mongoose handle the conversion automatically.
Warning: MongoDB’s schema flexibility is powerful but can become a liability if you have no discipline. Two developers writing to the same collection without a Mongoose schema can produce inconsistent documents that break your application at runtime. Always define a Mongoose schema for every collection — it gives you the best of both worlds: MongoDB flexibility with enforced shape and validation.

Documents and Collections

// A MongoDB document — a blog post
{
  "_id": "64a1f2b3c8e4d5f6a7b8c9d0",
  "title": "Getting Started with MERN",
  "slug": "getting-started-with-mern",
  "body": "The MERN stack is a powerful...",
  "author": {
    "name": "Jane Smith",
    "email": "jane@example.com"
  },
  "tags": ["mern", "javascript", "beginner"],
  "published": true,
  "viewCount": 142,
  "createdAt": "2025-01-01T00:00:00.000Z",
  "updatedAt": "2025-01-15T10:30:00.000Z"
}

The document above shows MongoDB’s key strengths: the author field is a nested object (no JOIN needed), tags is an array, and different documents in the same posts collection could have different fields without breaking anything.

MongoDB Hierarchy

Level Name Analogy (SQL) Example
1 MongoDB Server Database Server localhost:27017
2 Database Database / Schema blogdb
3 Collection Table posts, users, comments
4 Document Row One blog post object
5 Field Column title, body, tags

BSON Data Types

BSON Type JavaScript Equivalent Common Use
String string Text fields — title, body, slug
Number (Int32 / Double) number Counts, prices, ratings
Boolean boolean Flags — published, active, verified
Array Array Tags, list of IDs, embedded objects
Object object Nested sub-documents — address, author
ObjectId string (24 hex chars) Document _id, foreign references
Date Date createdAt, updatedAt, dueDate
Null null Optional fields with no value

Embedding vs Referencing

One of the most important decisions in MongoDB schema design is whether to embed related data inside a document or reference it by ID.

// Embedding — author data lives inside the post document
// Good when: author data rarely changes, you always need it with the post
{
  "title": "MERN Tutorial",
  "author": { "name": "Jane", "email": "jane@example.com" }
}

// Referencing — post stores only the author's ObjectId
// Good when: author data is shared across many posts, may be updated
{
  "title": "MERN Tutorial",
  "authorId": "64a1f2b3c8e4d5f6a7b8c9d0"
}

MongoDB Atlas — The Cloud Option

MongoDB Atlas Free Tier (M0)
════════════════════════════
Storage    : 512 MB
RAM        : Shared
Region     : Choose closest to your users
Connection : mongodb+srv://username:password@cluster.mongodb.net/dbname

Advantages over local MongoDB for learners:
  ✓ No local installation required
  ✓ Accessible from any machine or deployment environment
  ✓ Built-in backups, monitoring, and alerts
  ✓ Mirrors the setup you will use in production
  ✓ Free tier is permanent — not a trial

Common Mistakes

Mistake 1 — Deeply nesting everything

❌ Wrong — embedding all related data regardless of update patterns:

// Post with deeply nested comments and their authors and their profiles...
// Updating a user's name now requires updating hundreds of post documents
{ "title": "...", "comments": [ { "author": { "name": "...", "profile": {...} } } ] }

✅ Correct — embed data that is read together and rarely updated independently; reference data that is shared or updated frequently.

Mistake 2 — Using MongoDB like a relational database

❌ Wrong — creating a separate collection for every relationship and joining everything with $lookup:

post_tags table → tag_id foreign key → tags table → tag_category_id → tag_categories
// This is SQL thinking — MongoDB does not need this level of normalisation

✅ Correct — store tags as an array inside the post document. MongoDB is optimised for reading complete documents, not reconstructing data from many collections.

Mistake 3 — Ignoring indexes

❌ Wrong — querying a collection of 100,000 documents with no index on the query field:

Post.find({ slug: 'getting-started' }) // full collection scan — very slow at scale

✅ Correct — add an index on fields you query frequently:

// In your Mongoose schema
postSchema.index({ slug: 1 }, { unique: true });
postSchema.index({ createdAt: -1 }); // latest posts first

Quick Reference

Task MongoDB Shell Command
Show all databases show dbs
Use a database use blogdb
Show collections show collections
Insert a document db.posts.insertOne({ title: "Hello" })
Find all documents db.posts.find()
Find with filter db.posts.find({ published: true })
Count documents db.posts.countDocuments()
Delete a document db.posts.deleteOne({ _id: ObjectId("...") })

🧠 Test Yourself

A blog post has many comments. Each comment belongs to a user. Which MongoDB approach is best when you always display comments alongside the post and comments are never displayed independently?