Set operations combine the results of two or more SELECT statements into a single result set. Unlike JOINs that combine columns from different tables horizontally, set operations stack result sets vertically — stacking rows on top of each other. UNION combines and deduplicates, UNION ALL combines without deduplication, INTERSECT returns only rows present in both results, and EXCEPT returns rows in the first result that are not in the second. These operations are used less frequently than JOINs but are the cleanest solution for specific problems: combining search results from different tables, auditing data consistency, and building reports that aggregate from different sources.
UNION and UNION ALL
-- Requirements for set operations:
-- Both SELECT statements must have the same number of columns
-- Corresponding columns must have compatible data types
-- Column names come from the FIRST SELECT statement
-- UNION: combine results and remove duplicates
SELECT id, name, 'user' AS type FROM users WHERE role = 'admin'
UNION
SELECT id, name, 'moderator' AS type FROM moderators;
-- Duplicate rows (same id, name, type) are removed
-- UNION ALL: combine results keeping all duplicates (faster — no dedup step)
SELECT id, title, 'post' AS source FROM posts WHERE status = 'published'
UNION ALL
SELECT id, title, 'draft' AS source FROM posts WHERE status = 'draft'
ORDER BY title;
-- ORDER BY applies to the combined result, not individual SELECTs
-- Practical: combine search results from two tables
SELECT id, title, 'post' AS content_type, created_at
FROM posts
WHERE title ILIKE '%fastapi%' AND status = 'published'
UNION ALL
SELECT id, name AS title, 'tag' AS content_type, created_at
FROM tags
WHERE name ILIKE '%fastapi%'
ORDER BY created_at DESC
LIMIT 20;
UNION ALL is almost always preferred over UNION when you know duplicates will not occur or duplicates are acceptable. UNION must sort or hash the combined result to find and remove duplicates — a significant overhead for large result sets. Only use UNION (without ALL) when deduplication is genuinely required, such as combining two lists that may share entries.SELECT * FROM (SELECT ...) AS first ORDER BY ... nested inside the UNION, though this is an advanced and rarely needed pattern.CAST(col AS type) or col::type when mixing types across SELECT statements in a set operation.INTERSECT and EXCEPT
-- INTERSECT: rows that appear in BOTH result sets (intersection)
SELECT author_id FROM posts WHERE status = 'published'
INTERSECT
SELECT author_id FROM posts WHERE status = 'draft';
-- Authors who have BOTH published and draft posts
-- Practical: users who have both posted AND commented
SELECT id FROM users
WHERE id IN (SELECT DISTINCT author_id FROM posts)
INTERSECT
SELECT id FROM users
WHERE id IN (SELECT DISTINCT author_id FROM comments);
-- EXCEPT: rows in first result that are NOT in second result
SELECT author_id FROM posts
EXCEPT
SELECT id FROM users WHERE is_active = FALSE;
-- Post authors who are NOT inactive users
-- Practical: find tags used on posts but not in the allowed list
SELECT name FROM tags
EXCEPT
SELECT name FROM allowed_tags;
-- INTERSECT ALL and EXCEPT ALL: preserve duplicates
-- EXCEPT ALL preserves count differences:
-- If 'python' appears 3 times in first and 1 time in second: keeps 2 occurrences
SELECT tag FROM post_tags_log
EXCEPT ALL
SELECT tag FROM archived_post_tags;
Using Set Operations in FastAPI Queries
-- ── Combined search across posts and users ────────────────────────────────────
-- GET /api/search?q=alice
SELECT
'user' AS result_type,
id::TEXT AS result_id,
name AS title,
email AS subtitle,
created_at
FROM users
WHERE name ILIKE '%alice%' OR email ILIKE '%alice%'
UNION ALL
SELECT
'post' AS result_type,
id::TEXT AS result_id,
title,
LEFT(body, 100) AS subtitle,
created_at
FROM posts
WHERE (title ILIKE '%alice%' OR body ILIKE '%alice%')
AND status = 'published'
ORDER BY created_at DESC
LIMIT 20;
-- ── Data consistency check ───────────────────────────────────────────────────
-- Posts that reference tags which no longer exist
SELECT DISTINCT pt.tag_id
FROM post_tags pt
EXCEPT
SELECT id FROM tags;
-- Should return empty — if not, there is referential integrity violation
Common Mistakes
Mistake 1 — Mismatched column count between SELECT statements
❌ Wrong — different number of columns:
SELECT id, name, email FROM users
UNION
SELECT id, title FROM posts; -- ERROR: 3 columns vs 2 columns
✅ Correct — same column count, pad with NULLs or literals if needed:
SELECT id, name, email FROM users
UNION ALL
SELECT id, title, NULL AS email FROM posts; -- ✓ 3 columns each
Mistake 2 — Using UNION when UNION ALL is sufficient
❌ Wrong — unnecessary deduplication overhead:
SELECT post_id FROM comments WHERE is_approved = TRUE
UNION
SELECT post_id FROM post_reactions WHERE type = 'like';
-- If you just need a list (duplicates don't matter), UNION ALL is faster
✅ Correct — use UNION ALL and handle dedup in application if needed:
SELECT DISTINCT post_id FROM (
SELECT post_id FROM comments WHERE is_approved = TRUE
UNION ALL
SELECT post_id FROM post_reactions WHERE type = 'like'
) AS activity; -- ✓ explicit dedup only when needed
Mistake 3 — Trying to ORDER BY inside individual UNION parts
❌ Wrong — ORDER BY inside a UNION part:
SELECT id, title FROM posts ORDER BY title -- error or ignored
UNION ALL
SELECT id, name FROM users;
✅ Correct — ORDER BY after the final SELECT:
SELECT id, title FROM posts
UNION ALL
SELECT id, name FROM users
ORDER BY title; -- ✓ applies to combined result
Quick Reference
| Operation | Returns | Duplicates |
|---|---|---|
UNION |
All rows from both, combined | Removed |
UNION ALL |
All rows from both, combined | Kept (faster) |
INTERSECT |
Only rows in both | Removed |
INTERSECT ALL |
Only rows in both | Kept (min count) |
EXCEPT |
Rows in first, not in second | Removed |
EXCEPT ALL |
Rows in first, not in second | Count difference |