Collections and Documents — How MongoDB Stores Data

MongoDB organises all stored data in a three-level hierarchy: databases contain collections, and collections contain documents. Understanding this hierarchy — and specifically how documents are structured in BSON format — is the foundation for every query, schema design, and optimisation decision you will make throughout the MERN series. In this lesson you will learn what each level of the hierarchy represents, what a BSON document looks like and how it differs from plain JSON, and how MongoDB’s flexible schema model compares to the rigid structure of a SQL table.

The Three-Level Hierarchy

Level Name Description SQL Equivalent
1 Database A named container for a group of related collections Database / Schema
2 Collection A group of documents, loosely analogous to a table Table
3 Document A single record stored as a BSON object Row
Note: Unlike SQL tables, MongoDB collections do not need to be created explicitly before inserting data. If you insert a document into a collection that does not exist yet, MongoDB creates the collection automatically. Similarly, if you insert a document into a database that does not exist, the database is created on demand. You will see this when you use Mongoose — the first Model.create() call creates the collection if it does not already exist.
Tip: Name your MongoDB collections using lowercase, plural nouns that match your Mongoose model names. When you create a Mongoose model called 'Post', Mongoose automatically creates or uses a collection named posts (lowercased, pluralised). Following this convention keeps your collection names predictable — users, posts, comments, sessions.
Warning: A single MongoDB server can host many databases. Never store your application data in the admin, local, or config databases — these are MongoDB system databases. Always create a named database for your application (e.g. blogdb, mernblog) and keep application data there. The connection string specifies which database to use: mongodb://localhost:27017/blogdb.

What Is a BSON Document?

BSON (Binary JSON) is the binary-encoded format MongoDB uses to store documents on disk. From your application’s perspective you always work with plain JSON — Mongoose and the MongoDB driver handle the BSON conversion transparently. The key differences between BSON and JSON are that BSON supports additional data types (ObjectId, Date, Binary) and is designed for efficient storage and traversal by machines rather than for human readability.

// A complete MongoDB document — the posts collection
{
  "_id":       "64a1f2b3c8e4d5f6a7b8c9d0",    // ObjectId — auto-generated unique ID
  "title":     "Getting Started with MERN",    // String
  "body":      "MERN is a JavaScript stack...", // String
  "viewCount": 142,                             // Number (Int32)
  "rating":    4.8,                             // Number (Double)
  "published": true,                            // Boolean
  "tags":      ["mern", "javascript"],          // Array of Strings
  "author": {                                   // Embedded sub-document (Object)
    "_id":    "64a1f2b3c8e4d5f6a7b8c9d1",
    "name":   "Jane Smith"
  },
  "coverImage": null,                           // Null — field exists but has no value
  "createdAt": "2025-01-01T00:00:00.000Z",     // Date
  "updatedAt": "2025-01-15T10:30:00.000Z"      // Date
}

The _id Field

Every MongoDB document has a required _id field that uniquely identifies it within its collection. If you do not supply an _id when inserting, MongoDB generates one automatically using the ObjectId type.

ObjectId: 64a1f2b3c8e4d5f6a7b8c9d0
          │       │       │
          │       │       └─ 3-byte random counter
          │       └───────── 5-byte random value (machine + process)
          └───────────────── 4-byte Unix timestamp (seconds since epoch)

Key properties:
  ✓ Globally unique — two ObjectIds are never identical, even across servers
  ✓ Time-ordered — higher ObjectId values were created later
  ✓ 12 bytes (24 hex characters as a string)
  ✓ Generated client-side — your Node.js app creates them before inserting
  ✓ Can extract creation time: new mongoose.Types.ObjectId(id).getTimestamp()

Flexible Schema — Same Collection, Different Shapes

// Two documents in the same 'posts' collection with different fields
// Document 1: a simple post
{
  "_id": "64a1f2b3...",
  "title": "Quick Note",
  "body": "Short content.",
  "published": true
}

// Document 2: a rich post with extra fields
{
  "_id": "64a1f2b4...",
  "title": "Full Tutorial",
  "body": "Long content...",
  "excerpt": "A brief summary.",
  "coverImage": "https://cdn.example.com/img.jpg",
  "tags": ["mern", "tutorial"],
  "series": "MERN Stack",
  "partNumber": 1,
  "published": true,
  "featured": true,
  "readTimeMinutes": 8
}

// Both documents coexist in the same collection — MongoDB accepts this
// Without a Mongoose schema, this inconsistency is invisible and dangerous
// With a Mongoose schema, extra fields are stripped and missing required fields throw errors

Embedded Documents vs References

Embedding Referencing
How Nested object inside the document ObjectId pointing to another document
Read performance Fast — one read, no JOIN Slower — requires populate() / $lookup
Update complexity Hard to update embedded data across many documents Update once, reflected everywhere
Use when Data is always read together; child rarely changes independently Data is shared, updated frequently, or large
Blog example Post tags — always shown with the post, rarely updated Post author — user data changes; shared across posts
// Embedding — author data inside the post (denormalised)
{
  "title": "MERN Tutorial",
  "author": { "name": "Jane", "email": "jane@example.com" }
  // Changing Jane's name requires updating ALL of her posts
}

// Referencing — post stores only the author's ObjectId (normalised)
{
  "title": "MERN Tutorial",
  "author": "64a1f2b3c8e4d5f6a7b8c9d1"
  // Changing Jane's name requires updating only her user document
  // Mongoose populate() resolves the ID to the full user document at query time
}

Common Mistakes

Mistake 1 — Using the same database name as a MongoDB system database

❌ Wrong — using a reserved database name:

mongoose.connect('mongodb://localhost:27017/admin');   // system database!
mongoose.connect('mongodb://localhost:27017/local');   // system database!

✅ Correct — always use a custom application database name:

mongoose.connect('mongodb://localhost:27017/blogdb'); // ✓ your app's database

Mistake 2 — Embedding large arrays that grow without bound

❌ Wrong — embedding all comments directly inside a post document:

{
  "title": "Popular Post",
  "comments": [ ...10,000 comments embedded... ]
  // MongoDB document size limit is 16MB
  // Popular posts will exceed the limit and break
}

✅ Correct — store comments in a separate collection with a reference to the post:

// comments collection
{ "_id": "...", "postId": "64a1f2b3...", "body": "Great post!", "author": "..." }

Mistake 3 — Forgetting that the _id field is an ObjectId, not a string

❌ Wrong — comparing an ObjectId to a plain string in a query:

Post.find({ author: '64a1f2b3c8e4d5f6a7b8c9d1' }) // string vs ObjectId — may not match

✅ Correct — Mongoose handles this automatically when the schema field is declared as ObjectId type. But in raw MongoDB driver queries, convert explicitly:

const { ObjectId } = require('mongoose').Types;
Post.find({ author: new ObjectId('64a1f2b3c8e4d5f6a7b8c9d1') }); // ✓

Quick Reference

Task mongosh command
List all databases show dbs
Switch to a database use blogdb
List collections show collections
Count documents db.posts.countDocuments()
View one document db.posts.findOne()
Drop a collection db.posts.drop()
Drop a database db.dropDatabase()

🧠 Test Yourself

A blog post has many comments. Comments are only ever displayed on the post’s detail page and will never be queried independently. The post will never have more than 50 comments. Should you embed comments in the post document or store them in a separate collection?