Advanced Query Patterns — Lateral Joins, RETURNING and Upserts

PostgreSQL has several advanced query features that solve specific problems more elegantly than the standard patterns: LATERAL joins for correlated subqueries in FROM clauses, UPDATE … FROM for batch modifications using another table’s data, writable CTEs for multi-step atomic operations, and MERGE for conditional insert-or-update logic. These patterns appear in production FastAPI applications for tasks like updating many rows based on computed values, performing idempotent writes, and implementing complex multi-table operations atomically.

LATERAL Joins

-- LATERAL: a subquery in FROM that can reference columns from earlier in the FROM clause
-- Like a correlated subquery, but appears in FROM and can return multiple rows

-- Get each author's 3 most recent posts (lateral join approach)
SELECT
    u.id   AS author_id,
    u.name AS author_name,
    latest.id    AS post_id,
    latest.title AS post_title,
    latest.created_at
FROM users u
CROSS JOIN LATERAL (
    SELECT id, title, created_at
    FROM posts p
    WHERE p.author_id = u.id          -- references u.id from outer FROM
      AND p.status = 'published'
    ORDER BY created_at DESC
    LIMIT 3
) AS latest
ORDER BY u.name, latest.created_at DESC;

-- Why LATERAL over a subquery?
-- A regular subquery in FROM cannot reference columns from other FROM tables
-- LATERAL allows the inner query to be re-evaluated for each outer row

-- Alternative: LATERAL with LEFT JOIN (includes users with no posts)
SELECT u.name, latest.*
FROM users u
LEFT JOIN LATERAL (
    SELECT id, title FROM posts WHERE author_id = u.id
    ORDER BY created_at DESC LIMIT 1
) AS latest ON TRUE   -- ON TRUE with LEFT JOIN includes users with no posts
Note: CROSS JOIN LATERAL behaves like an INNER JOIN — users with no matching posts are excluded. LEFT JOIN LATERAL ... ON TRUE keeps all users even if the lateral subquery returns no rows (the lateral columns are NULL). The ON TRUE is a syntactic requirement for LEFT JOIN LATERAL — it means “always join, let the lateral subquery determine which rows.”
Tip: LATERAL joins are the cleanest SQL solution for “top-N per group” queries — the top 3 posts per author, the last 5 comments per post, the most recent order per customer. They are more readable than the ROW_NUMBER window function approach when you need only the top-N (not the full ranking), and often more efficient because the inner query uses LIMIT before the join is applied.
Warning: LATERAL joins are re-evaluated for each row of the outer query — similar to a correlated subquery. For an outer query that returns 1000 rows, the inner LATERAL query runs 1000 times. This is fine when each inner query is cheap (uses an index on author_id), but can be slow if the inner query requires a sequential scan. Always verify with EXPLAIN ANALYZE that the inner query uses an appropriate index.

UPDATE … FROM — Batch Updates

-- UPDATE ... FROM: update rows in one table based on values from another table
-- More efficient than a correlated subquery in SET or WHERE

-- Update post view counts from an analytics batch table
UPDATE posts
SET
    view_count = a.new_view_count,
    updated_at = NOW()
FROM analytics_batch a
WHERE posts.id = a.post_id
  AND a.batch_date = CURRENT_DATE
RETURNING posts.id, posts.view_count;

-- Bulk set tags based on another table
UPDATE posts p
SET status = 'archived'
FROM (
    SELECT id FROM posts
    WHERE created_at < NOW() - INTERVAL '2 years'
      AND status = 'published'
      AND view_count < 10
) AS old_posts
WHERE p.id = old_posts.id
RETURNING p.id, p.title;

-- Use UPDATE ... FROM for set operations:
-- Mark all posts as reviewed if their author passed a verification check
UPDATE posts p
SET reviewed = TRUE
FROM users u
WHERE p.author_id = u.id
  AND u.verified_at IS NOT NULL
  AND p.reviewed = FALSE
RETURNING p.id;

Writable CTEs for Multi-Step Operations

-- Chain INSERT/UPDATE/DELETE operations that depend on each other
-- All operations run in the same transaction atomically

-- Create a post and its initial tags in a single query
WITH new_post AS (
    INSERT INTO posts (author_id, title, slug, body, status)
    VALUES (:author_id, :title, :slug, :body, 'draft')
    RETURNING id, title, created_at
),
tag_ids AS (
    -- Get or create tags
    INSERT INTO tags (name, slug)
    VALUES (:tag1, :tag1_slug), (:tag2, :tag2_slug)
    ON CONFLICT (name) DO UPDATE SET name = EXCLUDED.name   -- no-op update to get RETURNING
    RETURNING id
)
INSERT INTO post_tags (post_id, tag_id)
SELECT new_post.id, tag_ids.id
FROM new_post CROSS JOIN tag_ids
RETURNING post_id, tag_id;

-- Move a post between authors and log it
WITH moved AS (
    UPDATE posts
    SET author_id = :new_author_id, updated_at = NOW()
    WHERE id = :post_id AND author_id = :old_author_id
    RETURNING id, author_id AS new_author, :old_author_id::BIGINT AS old_author
)
INSERT INTO post_transfer_log (post_id, from_author, to_author, transferred_at)
SELECT id, old_author, new_author, NOW()
FROM moved;

MERGE — Conditional Insert-or-Update

-- MERGE (PostgreSQL 15+): insert if not exists, update if exists
-- SQL standard syntax — cleaner than INSERT ... ON CONFLICT for complex cases

MERGE INTO post_stats AS target
USING (VALUES (:post_id::BIGINT, :views::INTEGER, :date::DATE)) AS src(post_id, views, stat_date)
ON target.post_id = src.post_id AND target.stat_date = src.stat_date
WHEN MATCHED THEN
    UPDATE SET
        view_count = target.view_count + src.views,
        updated_at = NOW()
WHEN NOT MATCHED THEN
    INSERT (post_id, view_count, stat_date)
    VALUES (src.post_id, src.views, src.stat_date);

-- MERGE with conditional logic
MERGE INTO users AS target
USING import_data AS src ON target.email = src.email
WHEN MATCHED AND src.is_deleted = TRUE THEN
    DELETE
WHEN MATCHED AND src.is_deleted = FALSE THEN
    UPDATE SET name = src.name, role = src.role, updated_at = NOW()
WHEN NOT MATCHED AND src.is_deleted = FALSE THEN
    INSERT (email, name, role) VALUES (src.email, src.name, src.role);

Common Mistakes

Mistake 1 — CROSS JOIN LATERAL without index on the lateral’s WHERE column

❌ Wrong — lateral query scans all posts for each user:

CROSS JOIN LATERAL (
    SELECT * FROM posts WHERE author_id = u.id LIMIT 3
) AS p   -- sequential scan if no index on author_id!

✅ Correct — ensure the FK is indexed:

CREATE INDEX idx_posts_author_id ON posts(author_id);   -- ✓

Mistake 2 — Using MERGE on PostgreSQL < 15

❌ Wrong — MERGE not available before PostgreSQL 15:

MERGE INTO ...   -- ERROR: syntax error on PostgreSQL 14 and earlier

✅ Correct — use INSERT … ON CONFLICT for older versions:

INSERT INTO post_stats (post_id, view_count, stat_date)
VALUES (:post_id, :views, :date)
ON CONFLICT (post_id, stat_date)
DO UPDATE SET view_count = post_stats.view_count + EXCLUDED.view_count;   -- ✓

Mistake 3 — Writable CTE operations not running in order

❌ Wrong — assuming CTE steps run in sequence when they can run in parallel:

WITH del AS (DELETE FROM old_posts RETURNING id),
     ins AS (INSERT INTO archive SELECT * FROM old_posts WHERE id IN (SELECT id FROM del))
-- del and ins may run in any order!

✅ Correct — make later CTEs depend on earlier ones to establish order:

WITH del AS (DELETE FROM old_posts RETURNING id, title, body),
     ins AS (INSERT INTO archive SELECT id, title, body FROM del RETURNING id)
SELECT count(*) FROM ins;   -- ✓ ins explicitly depends on del

Quick Reference

Pattern Use For
CROSS JOIN LATERAL Top-N per group, correlated subquery returning rows
LEFT JOIN LATERAL … ON TRUE Top-N per group, include nulls
UPDATE … FROM Batch update from another table or subquery
Writable CTE Multi-step INSERT/UPDATE/DELETE atomically
MERGE (PG15+) Conditional insert/update/delete in one statement
INSERT … ON CONFLICT Upsert (PG < 15 or simpler cases)

🧠 Test Yourself

You want each user’s 3 most recent posts in one query, keeping all users even if they have no posts. Which SQL approach is correct?