Database normalisation is the process of organising tables to minimise data redundancy and prevent anomalies — situations where inserting, updating, or deleting data produces incorrect results. The Normal Forms are a set of rules that guide this organisation. In practice, following normalisation principles produces schemas that are easier to maintain, less prone to bugs, and easier to query correctly. Understanding normalisation helps you design the blog application schema used throughout this series and explains why certain column arrangements are correct while others lead to subtle data consistency problems.
The Problems Normalisation Solves
-- ── UNNORMALISED: denormalised posts table ────────────────────────────────────
-- Problem: author information duplicated in every post row
CREATE TABLE posts_bad (
id BIGSERIAL PRIMARY KEY,
title TEXT,
body TEXT,
author_name TEXT, -- ← duplicated in every post by this author
author_email TEXT, -- ← if email changes, must update every post row!
author_role TEXT,
tag1 TEXT, -- ← fixed number of tags — breaks with 4 tags
tag2 TEXT,
tag3 TEXT
);
-- Insert anomaly: cannot add an author without a post
-- Update anomaly: changing author email requires updating N rows
-- Delete anomaly: deleting last post by an author loses their info
-- Tag limitation: arbitrary column count, hard to query, max 3 tags
-- ── NORMALISED: separate tables ───────────────────────────────────────────────
CREATE TABLE users (id BIGSERIAL PRIMARY KEY, name TEXT, email TEXT UNIQUE, role TEXT);
CREATE TABLE posts (id BIGSERIAL PRIMARY KEY, author_id BIGINT REFERENCES users(id), title TEXT, body TEXT);
CREATE TABLE tags (id BIGSERIAL PRIMARY KEY, name TEXT UNIQUE);
CREATE TABLE post_tags (post_id BIGINT REFERENCES posts(id), tag_id BIGINT REFERENCES tags(id), PRIMARY KEY(post_id, tag_id));
-- Each fact stored once — no duplication, no anomalies, unlimited tags
users table. If a fact appears in multiple rows, that is a signal it belongs in its own table with a foreign key relationship. The rule of thumb: if updating one real-world thing requires updating multiple rows in the database, the schema is not normalised enough.Normal Forms in Practice
-- ── First Normal Form (1NF) ───────────────────────────────────────────────────
-- Rule: no repeating groups, atomic values only
-- Violates 1NF: tags stored as comma-separated string (not atomic)
CREATE TABLE posts_1nf_bad (
id BIGSERIAL PRIMARY KEY,
title TEXT,
tags TEXT -- "python,fastapi,postgresql" — not atomic!
);
-- Satisfies 1NF: separate junction table
-- See post_tags table above ✓
-- ── Second Normal Form (2NF) ─────────────────────────────────────────────────
-- Rule: 1NF + no partial dependencies (applies when PK is composite)
-- Violates 2NF: order_items with partial dependency
CREATE TABLE order_items_bad (
order_id BIGINT,
product_id BIGINT,
quantity INTEGER,
product_name TEXT, -- depends on product_id alone, not (order_id, product_id)!
unit_price NUMERIC, -- same issue
PRIMARY KEY (order_id, product_id)
);
-- Fix: move product_name and unit_price to products table; store price_at_purchase
-- in order_items (that IS specific to the order)
-- ── Third Normal Form (3NF) ──────────────────────────────────────────────────
-- Rule: 2NF + no transitive dependencies
-- Violates 3NF: zip_code → city → state (transitive dependency)
CREATE TABLE users_3nf_bad (
id BIGSERIAL PRIMARY KEY,
name TEXT,
zip_code TEXT,
city TEXT, -- depends on zip_code, not directly on id
state TEXT -- depends on zip_code, not directly on id
);
-- Fix: separate addresses table with zip_code as PK,
-- users references addresses by zip_code
Practical Denormalisation Patterns
-- ── Counter caches — store aggregate for fast reads ──────────────────────────
ALTER TABLE posts ADD COLUMN comment_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE posts ADD COLUMN like_count INTEGER NOT NULL DEFAULT 0;
-- Update the cache when a comment is added (trigger or application logic)
UPDATE posts SET comment_count = comment_count + 1 WHERE id = :post_id;
-- Reading: SELECT comment_count FROM posts WHERE id = ? (no JOIN needed)
-- Writing: must keep in sync (risk of inconsistency if not managed carefully)
-- ── Materialised author name — avoid join for display ────────────────────────
ALTER TABLE posts ADD COLUMN author_name_cache TEXT;
-- Set on insert, update when author name changes (via trigger)
-- Trade-off: faster reads, but author name changes require cache update
-- ── When denormalisation is appropriate ──────────────────────────────────────
-- ✓ Counts that are read thousands of times per second (like count, view count)
-- ✓ Data that rarely changes (country name, static configuration)
-- ✓ Reporting tables that aggregate from transactional tables
-- ✗ Data that changes frequently (user email, post body)
-- ✗ Data where consistency is critical (financial balances)
Common Mistakes
Mistake 1 — Storing multiple values in a single column
❌ Wrong — comma-separated tags in one column:
tags TEXT DEFAULT '' -- "python,fastapi" — cannot query individual tags efficiently
✅ Correct — separate junction table or PostgreSQL array:
-- Option A: junction table (recommended for relational schemas)
-- Option B: ARRAY type (covered in Lesson 4)
tags TEXT[] DEFAULT '{}' -- PostgreSQL native array ✓
Mistake 2 — Duplicating data that changes
❌ Wrong — author email stored in each post:
author_email TEXT -- when email changes: UPDATE every post by that author!
✅ Correct — store foreign key, join when needed:
author_id BIGINT REFERENCES users(id) -- ✓ one update when email changes
Mistake 3 — Over-normalising lookup tables
❌ Wrong — separate table for a static, bounded list of statuses:
CREATE TABLE post_statuses (id SERIAL, name TEXT); -- 3 rows, never changes
post_status_id INTEGER REFERENCES post_statuses(id) -- unnecessary JOIN for a static list
✅ Correct — use a CHECK constraint or ENUM type:
status TEXT CHECK (status IN ('draft', 'published', 'archived')) -- ✓
Quick Reference — Normal Forms
| Form | Rule | Common Violation |
|---|---|---|
| 1NF | Atomic values, no repeating groups | Comma-separated lists, tag1/tag2/tag3 |
| 2NF | 1NF + no partial dependencies | Non-key column depends on part of composite PK |
| 3NF | 2NF + no transitive dependencies | A→B→C: city depends on zip, not on user id |
| Denormalise | Intentional redundancy for performance | Counter caches, materialised aggregates |