Collections and Documents — How MongoDB Stores Data

📚 MERN Stack 📂 Chapter 9: Introduction to MongoDB 📄 Lesson 9020 Beginner 🕒 October 8, 2025

MongoDB organises all stored data in a three-level hierarchy: databases contain collections, and collections contain documents. Understanding this hierarchy — and specifically how documents are structured in BSON format — is the foundation for every query, schema design, and optimisation decision you will make throughout the MERN series. In this lesson you will learn what each level of the hierarchy represents, what a BSON document looks like and how it differs from plain JSON, and how MongoDB’s flexible schema model compares to the rigid structure of a SQL table.

The Three-Level Hierarchy

Level	Name	Description	SQL Equivalent
1	Database	A named container for a group of related collections	Database / Schema
2	Collection	A group of documents, loosely analogous to a table	Table
3	Document	A single record stored as a BSON object	Row

Note: Unlike SQL tables, MongoDB collections do not need to be created explicitly before inserting data. If you insert a document into a collection that does not exist yet, MongoDB creates the collection automatically. Similarly, if you insert a document into a database that does not exist, the database is created on demand. You will see this when you use Mongoose — the first Model.create() call creates the collection if it does not already exist.

Tip: Name your MongoDB collections using lowercase, plural nouns that match your Mongoose model names. When you create a Mongoose model called 'Post', Mongoose automatically creates or uses a collection named posts (lowercased, pluralised). Following this convention keeps your collection names predictable — users, posts, comments, sessions.

Warning: A single MongoDB server can host many databases. Never store your application data in the admin, local, or config databases — these are MongoDB system databases. Always create a named database for your application (e.g. blogdb, mernblog) and keep application data there. The connection string specifies which database to use: mongodb://localhost:27017/blogdb.

What Is a BSON Document?

BSON (Binary JSON) is the binary-encoded format MongoDB uses to store documents on disk. From your application’s perspective you always work with plain JSON — Mongoose and the MongoDB driver handle the BSON conversion transparently. The key differences between BSON and JSON are that BSON supports additional data types (ObjectId, Date, Binary) and is designed for efficient storage and traversal by machines rather than for human readability.

// A complete MongoDB document — the posts collection
{
  "_id":       "64a1f2b3c8e4d5f6a7b8c9d0",    // ObjectId — auto-generated unique ID
  "title":     "Getting Started with MERN",    // String
  "body":      "MERN is a JavaScript stack...", // String
  "viewCount": 142,                             // Number (Int32)
  "rating":    4.8,                             // Number (Double)
  "published": true,                            // Boolean
  "tags":      ["mern", "javascript"],          // Array of Strings
  "author": {                                   // Embedded sub-document (Object)
    "_id":    "64a1f2b3c8e4d5f6a7b8c9d1",
    "name":   "Jane Smith"
  },
  "coverImage": null,                           // Null — field exists but has no value
  "createdAt": "2025-01-01T00:00:00.000Z",     // Date
  "updatedAt": "2025-01-15T10:30:00.000Z"      // Date
}

The _id Field

Every MongoDB document has a required _id field that uniquely identifies it within its collection. If you do not supply an _id when inserting, MongoDB generates one automatically using the ObjectId type.

ObjectId: 64a1f2b3c8e4d5f6a7b8c9d0
          │       │       │
          │       │       └─ 3-byte random counter
          │       └───────── 5-byte random value (machine + process)
          └───────────────── 4-byte Unix timestamp (seconds since epoch)

Key properties:
  ✓ Globally unique — two ObjectIds are never identical, even across servers
  ✓ Time-ordered — higher ObjectId values were created later
  ✓ 12 bytes (24 hex characters as a string)
  ✓ Generated client-side — your Node.js app creates them before inserting
  ✓ Can extract creation time: new mongoose.Types.ObjectId(id).getTimestamp()

Flexible Schema — Same Collection, Different Shapes

// Two documents in the same 'posts' collection with different fields
// Document 1: a simple post
{
  "_id": "64a1f2b3...",
  "title": "Quick Note",
  "body": "Short content.",
  "published": true
}

// Document 2: a rich post with extra fields
{
  "_id": "64a1f2b4...",
  "title": "Full Tutorial",
  "body": "Long content...",
  "excerpt": "A brief summary.",
  "coverImage": "https://cdn.example.com/img.jpg",
  "tags": ["mern", "tutorial"],
  "series": "MERN Stack",
  "partNumber": 1,
  "published": true,
  "featured": true,
  "readTimeMinutes": 8
}

// Both documents coexist in the same collection — MongoDB accepts this
// Without a Mongoose schema, this inconsistency is invisible and dangerous
// With a Mongoose schema, extra fields are stripped and missing required fields throw errors

Embedded Documents vs References

	Embedding	Referencing
How	Nested object inside the document	ObjectId pointing to another document
Read performance	Fast — one read, no JOIN	Slower — requires populate() / $lookup
Update complexity	Hard to update embedded data across many documents	Update once, reflected everywhere
Use when	Data is always read together; child rarely changes independently	Data is shared, updated frequently, or large
Blog example	Post tags — always shown with the post, rarely updated	Post author — user data changes; shared across posts

// Embedding — author data inside the post (denormalised)
{
  "title": "MERN Tutorial",
  "author": { "name": "Jane", "email": "jane@example.com" }
  // Changing Jane's name requires updating ALL of her posts
}

// Referencing — post stores only the author's ObjectId (normalised)
{
  "title": "MERN Tutorial",
  "author": "64a1f2b3c8e4d5f6a7b8c9d1"
  // Changing Jane's name requires updating only her user document
  // Mongoose populate() resolves the ID to the full user document at query time
}

Common Mistakes

Mistake 1 — Using the same database name as a MongoDB system database

❌ Wrong — using a reserved database name:

mongoose.connect('mongodb://localhost:27017/admin');   // system database!
mongoose.connect('mongodb://localhost:27017/local');   // system database!

✅ Correct — always use a custom application database name:

mongoose.connect('mongodb://localhost:27017/blogdb'); // ✓ your app's database

Mistake 2 — Embedding large arrays that grow without bound

❌ Wrong — embedding all comments directly inside a post document:

{
  "title": "Popular Post",
  "comments": [ ...10,000 comments embedded... ]
  // MongoDB document size limit is 16MB
  // Popular posts will exceed the limit and break
}

✅ Correct — store comments in a separate collection with a reference to the post:

// comments collection
{ "_id": "...", "postId": "64a1f2b3...", "body": "Great post!", "author": "..." }

Mistake 3 — Forgetting that the _id field is an ObjectId, not a string

❌ Wrong — comparing an ObjectId to a plain string in a query:

Post.find({ author: '64a1f2b3c8e4d5f6a7b8c9d1' }) // string vs ObjectId — may not match

✅ Correct — Mongoose handles this automatically when the schema field is declared as ObjectId type. But in raw MongoDB driver queries, convert explicitly:

const { ObjectId } = require('mongoose').Types;
Post.find({ author: new ObjectId('64a1f2b3c8e4d5f6a7b8c9d1') }); // ✓

Quick Reference

Task	mongosh command
List all databases	`show dbs`
Switch to a database	`use blogdb`
List collections	`show collections`
Count documents	`db.posts.countDocuments()`
View one document	`db.posts.findOne()`
Drop a collection	`db.posts.drop()`
Drop a database	`db.dropDatabase()`

The Three-Level Hierarchy #

What Is a BSON Document? #

The _id Field #

Flexible Schema — Same Collection, Different Shapes #

Embedded Documents vs References #

Common Mistakes #

Mistake 1 — Using the same database name as a MongoDB system database #

Mistake 2 — Embedding large arrays that grow without bound #

Mistake 3 — Forgetting that the _id field is an ObjectId, not a string #

Quick Reference #

🧠 Test Yourself #

📚 More in this Tutorial Series

The Three-Level Hierarchy

What Is a BSON Document?

The _id Field

Flexible Schema — Same Collection, Different Shapes

Embedded Documents vs References

Common Mistakes

Mistake 1 — Using the same database name as a MongoDB system database

Mistake 2 — Embedding large arrays that grow without bound

Mistake 3 — Forgetting that the _id field is an ObjectId, not a string

Quick Reference

🧠 Test Yourself