CTEs — Common Table Expressions and Recursive Queries

A Common Table Expression (CTE) is a named temporary result set defined with the WITH keyword before a query. CTEs make complex queries more readable by breaking them into named, sequential steps — each CTE builds on the previous ones, and the final SELECT uses the CTEs as if they were tables. They also enable recursive queries for hierarchical data like threaded comments, category trees, and organisational charts — patterns that cannot be expressed with a simple non-recursive query. Recursive CTEs are one of PostgreSQL’s most powerful features for modelling tree-structured data.

Basic CTEs

-- CTE syntax: WITH name AS (SELECT ...) SELECT ... FROM name
-- Can define multiple CTEs, separated by commas

WITH published_posts AS (
    SELECT id, title, author_id, view_count, created_at
    FROM posts
    WHERE status = 'published'
      AND deleted_at IS NULL
),
author_stats AS (
    SELECT
        author_id,
        COUNT(*)         AS post_count,
        SUM(view_count)  AS total_views
    FROM published_posts
    GROUP BY author_id
)
SELECT
    u.name              AS author_name,
    u.email,
    s.post_count,
    s.total_views,
    ROUND(s.total_views::NUMERIC / NULLIF(s.post_count, 0), 0) AS avg_views
FROM author_stats s
JOIN users u ON s.author_id = u.id
WHERE s.post_count >= 3
ORDER BY s.total_views DESC;
Note: In PostgreSQL 12 and later, CTEs are NOT always materialised — the planner can inline them into the parent query if it would be more efficient. In PostgreSQL 11 and earlier, CTEs were always materialised as a temporary table (a “CTE fence”), which could be slower or faster than an inline subquery depending on the situation. Use WITH cte AS MATERIALIZED (...) to force materialisation or WITH cte AS NOT MATERIALIZED (...) to force inlining when you need specific behaviour.
Tip: CTEs are primarily a readability tool, not a performance tool. The same query written with a CTE or a derived table in the FROM clause will often have identical performance in PostgreSQL 12+. Use CTEs when they make a complex query significantly easier to understand — especially when you need to reference the same intermediate result multiple times, or when building up a query in logical steps helps the reader follow the logic.
Warning: Recursive CTEs are powerful but easy to get wrong — a missing or incorrect termination condition will cause an infinite loop. PostgreSQL will eventually terminate the recursion and raise an error when the recursion limit (max_recursion_depth) is reached, but this uses significant memory and CPU. Always test recursive CTEs on small datasets first, and always include a depth counter or visited-nodes check to prevent cycles in graph data (as opposed to pure tree data).

Chaining CTEs

-- Multiple CTEs, each building on the previous
WITH
-- Step 1: find tags that appear on at least 5 published posts
popular_tags AS (
    SELECT t.id, t.name, COUNT(pt.post_id) AS usage_count
    FROM tags t
    JOIN post_tags pt ON t.id = pt.tag_id
    JOIN posts p      ON pt.post_id = p.id AND p.status = 'published'
    GROUP BY t.id, t.name
    HAVING COUNT(pt.post_id) >= 5
),
-- Step 2: find all published posts that have at least one popular tag
posts_with_popular_tags AS (
    SELECT DISTINCT p.id, p.title, p.view_count, p.author_id
    FROM posts p
    JOIN post_tags pt ON p.id = pt.post_id
    JOIN popular_tags tg ON pt.tag_id = tg.id
    WHERE p.status = 'published'
),
-- Step 3: rank them by view count
ranked_posts AS (
    SELECT
        *,
        RANK() OVER (ORDER BY view_count DESC) AS overall_rank
    FROM posts_with_popular_tags
)
SELECT r.*, u.name AS author_name
FROM ranked_posts r
JOIN users u ON r.author_id = u.id
WHERE overall_rank <= 10
ORDER BY overall_rank;

Recursive CTEs

-- Recursive CTE structure:
-- WITH RECURSIVE name AS (
--     anchor query (non-recursive, base case)
--     UNION ALL
--     recursive query (references name itself)
-- )

-- Example: category tree (categories with parent_id)
WITH RECURSIVE category_tree AS (
    -- Anchor: start from top-level categories (no parent)
    SELECT
        id,
        name,
        parent_id,
        0           AS depth,
        name        AS path,
        ARRAY[id]   AS visited
    FROM categories
    WHERE parent_id IS NULL

    UNION ALL

    -- Recursive: find children of each node in the previous iteration
    SELECT
        c.id,
        c.name,
        c.parent_id,
        ct.depth + 1,
        ct.path || ' > ' || c.name,
        ct.visited || c.id
    FROM categories c
    JOIN category_tree ct ON c.parent_id = ct.id
    WHERE c.id != ALL(ct.visited)   -- prevent cycles
      AND ct.depth < 10              -- max depth safety limit
)
SELECT id, name, depth, path
FROM category_tree
ORDER BY path;

-- Result:
-- id=1, name="Technology", depth=0, path="Technology"
-- id=2, name="Python",     depth=1, path="Technology > Python"
-- id=3, name="FastAPI",    depth=2, path="Technology > Python > FastAPI"

-- ── Threaded comments (find all replies to a root comment) ────────────────────
WITH RECURSIVE comment_thread AS (
    -- Anchor: the root comment
    SELECT id, body, author_id, parent_comment_id, 0 AS depth
    FROM comments
    WHERE id = :root_comment_id

    UNION ALL

    -- Recursive: direct children of each comment in the current set
    SELECT c.id, c.body, c.author_id, c.parent_comment_id, ct.depth + 1
    FROM comments c
    JOIN comment_thread ct ON c.parent_comment_id = ct.id
    WHERE ct.depth < 5   -- limit nesting depth
)
SELECT ct.*, u.name AS author_name
FROM comment_thread ct
JOIN users u ON ct.author_id = u.id
ORDER BY depth, id;

CTE as a Writable CTE (Data-Modifying)

-- CTEs can contain INSERT, UPDATE, DELETE (with RETURNING)
-- Useful for multi-step operations in a single query

-- Archive old posts and log them in one atomic operation
WITH archived AS (
    UPDATE posts
    SET status = 'archived', updated_at = NOW()
    WHERE status = 'published'
      AND created_at < NOW() - INTERVAL '2 years'
    RETURNING id, title, author_id, created_at
)
INSERT INTO post_archive_log (post_id, title, author_id, archived_at)
SELECT id, title, author_id, NOW()
FROM archived;

-- This runs atomically: either both the UPDATE and INSERT succeed,
-- or neither does (they share the same transaction)

Common Mistakes

Mistake 1 — Infinite recursion (missing termination condition)

❌ Wrong — cycles in data cause infinite recursion:

WITH RECURSIVE tree AS (
    SELECT id, parent_id FROM nodes WHERE parent_id IS NULL
    UNION ALL
    SELECT n.id, n.parent_id FROM nodes n JOIN tree t ON n.parent_id = t.id
)   -- if nodes have circular references: infinite loop!

✅ Correct — add a depth limit and cycle detection:

WHERE t.depth < 20 AND n.id != ALL(t.visited)   -- ✓

Mistake 2 — Using UNION instead of UNION ALL in recursive CTE

❌ Wrong — UNION tries to deduplicate, which is very slow for recursive CTEs:

WITH RECURSIVE tree AS (
    ... UNION ...   -- deduplication on each recursion level: very slow!

✅ Correct — UNION ALL (dedup by cycle detection, not UNION):

WITH RECURSIVE tree AS (
    ... UNION ALL ...   -- ✓ fast; use visited array for cycle detection

Mistake 3 — Referencing a CTE more than once when it is not materialised

❌ Wrong — non-materialised CTE re-executes for each reference (PostgreSQL 12+):

WITH expensive AS (SELECT expensive_computation() AS result)
SELECT * FROM expensive e1 JOIN expensive e2 ON e1.id = e2.id;
-- expensive runs TWICE if not materialised

✅ Correct — force materialisation when re-using:

WITH expensive AS MATERIALIZED (SELECT expensive_computation() AS result)
SELECT * FROM expensive e1 JOIN expensive e2 ON e1.id = e2.id;   -- ✓ runs once

Quick Reference

Pattern SQL
Single CTE WITH name AS (SELECT ...) SELECT ...
Multiple CTEs WITH a AS (...), b AS (...) SELECT ...
Recursive CTE WITH RECURSIVE name AS (anchor UNION ALL recursive)
Force materialise WITH name AS MATERIALIZED (...)
Prevent inline WITH name AS NOT MATERIALIZED (...)
Writable CTE CTE with INSERT/UPDATE/DELETE + RETURNING

🧠 Test Yourself

You need to find all subcategories at any depth below category id=5, including their depth level. Which SQL approach is correct?