A subquery is a SELECT statement nested inside another SQL statement — in a WHERE clause, a FROM clause, or even as a column expression. Subqueries solve problems that are difficult to express cleanly as joins: finding rows that match a condition derived from another table, testing for existence, or using an aggregated result as a filter. The choice between a subquery and a join is often stylistic, but understanding both gives you the flexibility to choose whichever produces clearer, more maintainable SQL for each specific situation.
Subqueries in WHERE
-- Scalar subquery: returns a single value
SELECT id, title, view_count
FROM posts
WHERE view_count > (SELECT AVG(view_count) FROM posts)
ORDER BY view_count DESC;
-- The inner query computes the average once, then the outer WHERE filters by it
-- IN subquery: find posts by admin users
SELECT id, title, author_id
FROM posts
WHERE author_id IN (
SELECT id FROM users WHERE role = 'admin'
)
ORDER BY created_at DESC;
-- NOT IN subquery: posts with no comments yet
SELECT id, title
FROM posts
WHERE id NOT IN (
SELECT DISTINCT post_id FROM comments
)
AND status = 'published';
-- Note: NOT IN with NULL values behaves unexpectedly — prefer NOT EXISTS
NOT IN has a dangerous edge case with NULL values. If the subquery returns even one NULL value, NOT IN returns no rows at all — because x NOT IN (1, 2, NULL) evaluates to x <> 1 AND x <> 2 AND x <> NULL, and x <> NULL is always NULL (unknown), making the entire condition NULL (unknown). Use NOT EXISTS instead of NOT IN when the subquery might return NULLs — it is safer and often faster.EXISTS and NOT EXISTS for existence checks — they are typically faster than IN/NOT IN for large subqueries because they short-circuit as soon as one matching row is found. EXISTS (SELECT 1 FROM ...) does not need to retrieve actual column values — it just checks whether any row satisfies the condition. The convention SELECT 1 (rather than SELECT *) makes the intent explicit: we only care about existence, not values.EXISTS and NOT EXISTS
-- EXISTS: true if the subquery returns at least one row
SELECT id, title
FROM posts p
WHERE EXISTS (
SELECT 1 FROM comments c
WHERE c.post_id = p.id -- correlated: references outer query's p.id
AND c.is_approved = TRUE
)
AND p.status = 'published';
-- Returns published posts that have at least one approved comment
-- NOT EXISTS: true if subquery returns zero rows (safe with NULLs, unlike NOT IN)
SELECT id, title
FROM posts p
WHERE NOT EXISTS (
SELECT 1 FROM post_tags pt
WHERE pt.post_id = p.id
);
-- Returns posts with no tags at all
Subqueries in FROM (Derived Tables)
-- A subquery in FROM creates a temporary "virtual table"
-- Must be given an alias
-- Top 3 authors by published post count, with their most recent post date
SELECT
top_authors.author_id,
u.name,
top_authors.pub_count,
top_authors.latest_post
FROM (
SELECT
author_id,
COUNT(*) AS pub_count,
MAX(created_at) AS latest_post
FROM posts
WHERE status = 'published'
GROUP BY author_id
ORDER BY pub_count DESC
LIMIT 3
) AS top_authors
JOIN users u ON top_authors.author_id = u.id
ORDER BY top_authors.pub_count DESC;
-- Pagination total count alongside data (two queries in one)
SELECT
p.*,
total.count AS total_results
FROM posts p
CROSS JOIN (
SELECT COUNT(*) AS count FROM posts WHERE status = 'published'
) AS total
WHERE p.status = 'published'
ORDER BY p.created_at DESC
LIMIT 10;
Scalar Subqueries in SELECT
-- A scalar subquery in SELECT returns one value per row
-- Equivalent to a correlated subquery — can be slow for large result sets
SELECT
p.id,
p.title,
(SELECT COUNT(*) FROM comments c WHERE c.post_id = p.id) AS comment_count,
(SELECT COUNT(*) FROM post_tags pt WHERE pt.post_id = p.id) AS tag_count
FROM posts p
WHERE p.status = 'published'
ORDER BY p.created_at DESC
LIMIT 20;
-- Better alternative: use LEFT JOIN + GROUP BY (single pass instead of per-row queries)
SELECT
p.id,
p.title,
COUNT(DISTINCT c.id) AS comment_count,
COUNT(DISTINCT pt.tag_id) AS tag_count
FROM posts p
LEFT JOIN comments c ON p.id = c.post_id
LEFT JOIN post_tags pt ON p.id = pt.post_id
WHERE p.status = 'published'
GROUP BY p.id, p.title
ORDER BY p.created_at DESC
LIMIT 20;
Common Mistakes
Mistake 1 — NOT IN with potential NULLs in subquery
❌ Wrong — silently returns no rows if subquery has NULLs:
WHERE post_id NOT IN (SELECT post_id FROM comments)
-- If any comment has NULL post_id: returns zero rows!
✅ Correct — use NOT EXISTS:
WHERE NOT EXISTS (SELECT 1 FROM comments c WHERE c.post_id = p.id) -- ✓ safe
Mistake 2 — Correlated subquery in SELECT for large result sets
❌ Wrong — runs once per result row (N queries):
SELECT p.title, (SELECT COUNT(*) FROM comments c WHERE c.post_id = p.id) AS cnt
FROM posts p; -- runs N comment-count queries for N posts!
✅ Correct — single JOIN + GROUP BY:
SELECT p.title, COUNT(c.id) AS cnt
FROM posts p LEFT JOIN comments c ON p.id = c.post_id
GROUP BY p.id, p.title; -- ✓ single pass
Mistake 3 — Forgetting to alias a derived table
❌ Wrong — subquery in FROM needs an alias:
SELECT * FROM (SELECT id FROM posts LIMIT 5);
-- ERROR: subquery in FROM must have an alias
✅ Correct:
SELECT * FROM (SELECT id FROM posts LIMIT 5) AS recent_posts; -- ✓
Quick Reference
| Pattern | Use For |
|---|---|
WHERE col = (SELECT ...) |
Compare to computed single value |
WHERE col IN (SELECT ...) |
Match any value from a set |
WHERE EXISTS (SELECT 1 FROM ... WHERE ...) |
Check if related row exists |
WHERE NOT EXISTS (...) |
Check if no related row exists (safe with NULLs) |
FROM (SELECT ...) AS alias |
Use aggregated/filtered result as a table |
SELECT (SELECT ... WHERE id = outer.id) |
Correlated scalar value per row (avoid for large sets) |