JSONB, Arrays and Custom Enum Types

PostgreSQL supports several column types that go beyond the standard SQL primitives: JSONB for semi-structured document data, native arrays for ordered lists of primitive values, and custom ENUM types for constrained string fields. These types can eliminate the need for separate tables in specific cases — storing a post’s metadata as JSONB instead of a separate metadata table, storing a list of IP addresses as a TEXT[] instead of a separate table. Understanding when these types save complexity vs when they cause problems is one of the more nuanced schema design decisions.

JSONB — Semi-Structured Storage

-- JSONB stores JSON as binary — faster to query than JSON (text)
-- JSON stores exact text — preserves key order, whitespace

CREATE TABLE posts (
    id       BIGSERIAL PRIMARY KEY,
    title    TEXT NOT NULL,
    metadata JSONB NOT NULL DEFAULT '{}'
);

-- Insert JSON
INSERT INTO posts (title, metadata)
VALUES (
    'FastAPI Guide',
    '{"reading_time": 5, "difficulty": "intermediate", "series": "fastapi-tutorial"}'
);

-- Read top-level key: -> returns JSONB, ->> returns TEXT
SELECT metadata->'reading_time'    AS reading_time_json,    -- "5" (JSONB)
       metadata->>'reading_time'   AS reading_time_text     -- "5"  (TEXT)
FROM posts WHERE id = 1;

-- Nested access: ->> on nested path
SELECT metadata->'author'->>'name' AS author_name FROM posts;

-- Containment: @> checks if left contains right
SELECT * FROM posts WHERE metadata @> '{"difficulty": "intermediate"}';

-- Key existence: ? operator
SELECT * FROM posts WHERE metadata ? 'reading_time';

-- Update a single key (without replacing the whole document)
UPDATE posts
SET metadata = metadata || '{"last_updated": "2025-08-06"}'
WHERE id = 1;

-- GIN index for fast JSONB queries
CREATE INDEX idx_posts_metadata ON posts USING GIN (metadata);
CREATE INDEX idx_posts_meta_diff ON posts USING GIN (metadata jsonb_path_ops);  -- smaller, for @>
Note: Use JSONB (not JSON) for almost all use cases. JSONB stores data in a parsed binary format: it deduplicates keys, doesn’t preserve key order or extra whitespace, and supports GIN indexes for fast querying. JSON stores the exact text — useful only when you need to preserve key insertion order or whitespace (rarely needed). The performance difference for reads and indexed queries is significant in favour of JSONB.
Tip: JSONB is appropriate when different rows legitimately have different structures — user preferences, product attributes, extensible metadata, webhook payloads. If all rows have the same structure, normalise into regular columns (easier to index, join, and validate). A good heuristic: if you query a JSONB field in WHERE, ORDER BY, or need to aggregate over it frequently, consider promoting it to a regular column with a proper type and index.
Warning: JSONB does not enforce schema — any valid JSON is accepted regardless of what keys the application expects. A column defined as metadata JSONB will happily store {"unexpected_key": [1,2,3]} even if your application expects {"reading_time": int}. For schema enforcement on JSONB, use CHECK constraints with jsonb_typeof() or move to Pydantic validation in FastAPI before inserting. Do not rely on JSONB for data that has a fixed, critical structure.

PostgreSQL Arrays

-- Native array columns — hold ordered lists of a single type
CREATE TABLE posts (
    id    BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    tags  TEXT[] NOT NULL DEFAULT '{}',    -- array of strings
    scores INTEGER[] DEFAULT '{}'          -- array of integers
);

-- Insert with array literal
INSERT INTO posts (title, tags) VALUES ('Python Guide', '{"python","tutorial","beginner"}');
-- Or using ARRAY constructor:
INSERT INTO posts (title, tags) VALUES ('FastAPI Intro', ARRAY['fastapi', 'python', 'api']);

-- Access elements (1-indexed!)
SELECT tags[1] AS first_tag FROM posts;    -- first element
SELECT tags[1:3] AS first_three FROM posts; -- slice

-- Array functions
SELECT ARRAY_LENGTH(tags, 1) AS tag_count FROM posts;
SELECT UNNEST(tags) AS tag FROM posts WHERE id = 1;  -- expand to rows
SELECT tags || ARRAY['new-tag'] AS tags_with_new FROM posts WHERE id = 1;  -- append

-- Array containment operators
SELECT * FROM posts WHERE tags @> ARRAY['python'];   -- contains 'python'
SELECT * FROM posts WHERE tags && ARRAY['python', 'fastapi'];  -- overlaps

-- GIN index for fast array queries
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);

Custom ENUM Types

-- Create a custom ENUM type
CREATE TYPE post_status AS ENUM ('draft', 'published', 'archived');
CREATE TYPE user_role    AS ENUM ('user', 'editor', 'admin');

-- Use in table definition
CREATE TABLE posts (
    id     BIGSERIAL PRIMARY KEY,
    title  TEXT NOT NULL,
    status post_status NOT NULL DEFAULT 'draft'
);

-- ENUM values are stored as integers internally — small, fast comparisons
-- But they look like strings in queries:
INSERT INTO posts (title, status) VALUES ('Hello', 'published');
SELECT * FROM posts WHERE status = 'published';
SELECT * FROM posts WHERE status IN ('draft', 'published');  -- valid
SELECT * FROM posts WHERE status > 'draft';  -- compares by definition order

-- Inspect ENUM values
SELECT enum_range(NULL::post_status);   -- {draft,published,archived}

-- Add a new value to an existing ENUM (PostgreSQL 9.1+)
ALTER TYPE post_status ADD VALUE 'scheduled' BEFORE 'published';
-- Note: cannot remove values from an ENUM without dropping and recreating it

-- Order of values: as defined in CREATE TYPE
-- 'draft' < 'published' < 'archived'
Need Best Choice Why
Fixed set of string values ENUM or CHECK Enforces values, compact storage
Simple list of strings (tags) TEXT[] or junction table Array if querying by tag; junction for rich metadata
Different structure per row JSONB Schema-flexible, queryable with GIN index
Fixed structure, all rows same Separate columns Typed, indexed, queryable, validates structure
Many-to-many with metadata Junction table Can store relationship metadata, proper FK constraints

Common Mistakes

Mistake 1 — Using JSONB for data that should be regular columns

❌ Wrong — querying structured data from JSONB is inefficient and fragile:

SELECT metadata->>'author_name' FROM posts;   -- should be a proper column with FK

✅ Correct — use JSONB only for genuinely variable-structure data.

Mistake 2 — Forgetting GIN index on JSONB/array columns used in WHERE

❌ Wrong — full table scan for every @> query:

SELECT * FROM posts WHERE metadata @> '{"difficulty": "hard"}';  -- sequential scan!

✅ Correct — add GIN index:

CREATE INDEX idx_posts_metadata ON posts USING GIN (metadata);   -- ✓

Mistake 3 — Using ENUM for values that may change

❌ Wrong — ENUM values are hard to change later:

CREATE TYPE status AS ENUM ('open', 'closed');
-- Need to add 'pending'? ALTER TYPE works. Need to remove 'closed'? Drop and recreate!

✅ Correct — use TEXT + CHECK for mutable value sets:

status TEXT CHECK (status IN ('open', 'closed', 'pending'))   -- ✓ easy to modify

Quick Reference

Type Query Operator Index Type
JSONB key access col->'key' (JSONB), col->>'key' (TEXT) GIN
JSONB containment col @> '{"k":"v"}' GIN
JSONB key exists col ? 'key' GIN
Array element col[1] (1-indexed) GIN
Array contains col @> ARRAY['val'] GIN
Array overlaps col && ARRAY['v1','v2'] GIN
ENUM comparison col = 'value' B-tree (default)

🧠 Test Yourself

A post’s metadata JSONB column stores {"reading_time": 5, "difficulty": "beginner"}. How do you query all posts where difficulty is "intermediate" using the GIN index efficiently?