Subqueries — Queries Within Queries

A subquery is a SELECT statement nested inside another SQL statement — in a WHERE clause, a FROM clause, or even as a column expression. Subqueries solve problems that are difficult to express cleanly as joins: finding rows that match a condition derived from another table, testing for existence, or using an aggregated result as a filter. The choice between a subquery and a join is often stylistic, but understanding both gives you the flexibility to choose whichever produces clearer, more maintainable SQL for each specific situation.

Subqueries in WHERE

-- Scalar subquery: returns a single value
SELECT id, title, view_count
FROM posts
WHERE view_count > (SELECT AVG(view_count) FROM posts)
ORDER BY view_count DESC;
-- The inner query computes the average once, then the outer WHERE filters by it

-- IN subquery: find posts by admin users
SELECT id, title, author_id
FROM posts
WHERE author_id IN (
    SELECT id FROM users WHERE role = 'admin'
)
ORDER BY created_at DESC;

-- NOT IN subquery: posts with no comments yet
SELECT id, title
FROM posts
WHERE id NOT IN (
    SELECT DISTINCT post_id FROM comments
)
AND status = 'published';
-- Note: NOT IN with NULL values behaves unexpectedly — prefer NOT EXISTS
Note: NOT IN has a dangerous edge case with NULL values. If the subquery returns even one NULL value, NOT IN returns no rows at all — because x NOT IN (1, 2, NULL) evaluates to x <> 1 AND x <> 2 AND x <> NULL, and x <> NULL is always NULL (unknown), making the entire condition NULL (unknown). Use NOT EXISTS instead of NOT IN when the subquery might return NULLs — it is safer and often faster.
Tip: Use EXISTS and NOT EXISTS for existence checks — they are typically faster than IN/NOT IN for large subqueries because they short-circuit as soon as one matching row is found. EXISTS (SELECT 1 FROM ...) does not need to retrieve actual column values — it just checks whether any row satisfies the condition. The convention SELECT 1 (rather than SELECT *) makes the intent explicit: we only care about existence, not values.
Warning: Correlated subqueries — where the inner query references columns from the outer query — are re-evaluated for every row of the outer query. For a table with 10,000 rows, a correlated subquery in SELECT runs 10,000 times. This is almost always slower than the equivalent JOIN or window function. Use correlated subqueries in WHERE and EXISTS checks (where they often short-circuit early), but avoid them in SELECT column lists for large result sets — use a JOIN or window function instead.

EXISTS and NOT EXISTS

-- EXISTS: true if the subquery returns at least one row
SELECT id, title
FROM posts p
WHERE EXISTS (
    SELECT 1 FROM comments c
    WHERE c.post_id = p.id     -- correlated: references outer query's p.id
      AND c.is_approved = TRUE
)
AND p.status = 'published';
-- Returns published posts that have at least one approved comment

-- NOT EXISTS: true if subquery returns zero rows (safe with NULLs, unlike NOT IN)
SELECT id, title
FROM posts p
WHERE NOT EXISTS (
    SELECT 1 FROM post_tags pt
    WHERE pt.post_id = p.id
);
-- Returns posts with no tags at all

Subqueries in FROM (Derived Tables)

-- A subquery in FROM creates a temporary "virtual table"
-- Must be given an alias

-- Top 3 authors by published post count, with their most recent post date
SELECT
    top_authors.author_id,
    u.name,
    top_authors.pub_count,
    top_authors.latest_post
FROM (
    SELECT
        author_id,
        COUNT(*)     AS pub_count,
        MAX(created_at) AS latest_post
    FROM posts
    WHERE status = 'published'
    GROUP BY author_id
    ORDER BY pub_count DESC
    LIMIT 3
) AS top_authors
JOIN users u ON top_authors.author_id = u.id
ORDER BY top_authors.pub_count DESC;

-- Pagination total count alongside data (two queries in one)
SELECT
    p.*,
    total.count AS total_results
FROM posts p
CROSS JOIN (
    SELECT COUNT(*) AS count FROM posts WHERE status = 'published'
) AS total
WHERE p.status = 'published'
ORDER BY p.created_at DESC
LIMIT 10;

Scalar Subqueries in SELECT

-- A scalar subquery in SELECT returns one value per row
-- Equivalent to a correlated subquery — can be slow for large result sets

SELECT
    p.id,
    p.title,
    (SELECT COUNT(*) FROM comments c WHERE c.post_id = p.id) AS comment_count,
    (SELECT COUNT(*) FROM post_tags pt WHERE pt.post_id = p.id) AS tag_count
FROM posts p
WHERE p.status = 'published'
ORDER BY p.created_at DESC
LIMIT 20;

-- Better alternative: use LEFT JOIN + GROUP BY (single pass instead of per-row queries)
SELECT
    p.id,
    p.title,
    COUNT(DISTINCT c.id)  AS comment_count,
    COUNT(DISTINCT pt.tag_id) AS tag_count
FROM posts p
LEFT JOIN comments c   ON p.id = c.post_id
LEFT JOIN post_tags pt ON p.id = pt.post_id
WHERE p.status = 'published'
GROUP BY p.id, p.title
ORDER BY p.created_at DESC
LIMIT 20;

Common Mistakes

Mistake 1 — NOT IN with potential NULLs in subquery

❌ Wrong — silently returns no rows if subquery has NULLs:

WHERE post_id NOT IN (SELECT post_id FROM comments)
-- If any comment has NULL post_id: returns zero rows!

✅ Correct — use NOT EXISTS:

WHERE NOT EXISTS (SELECT 1 FROM comments c WHERE c.post_id = p.id)   -- ✓ safe

Mistake 2 — Correlated subquery in SELECT for large result sets

❌ Wrong — runs once per result row (N queries):

SELECT p.title, (SELECT COUNT(*) FROM comments c WHERE c.post_id = p.id) AS cnt
FROM posts p;   -- runs N comment-count queries for N posts!

✅ Correct — single JOIN + GROUP BY:

SELECT p.title, COUNT(c.id) AS cnt
FROM posts p LEFT JOIN comments c ON p.id = c.post_id
GROUP BY p.id, p.title;   -- ✓ single pass

Mistake 3 — Forgetting to alias a derived table

❌ Wrong — subquery in FROM needs an alias:

SELECT * FROM (SELECT id FROM posts LIMIT 5);
-- ERROR: subquery in FROM must have an alias

✅ Correct:

SELECT * FROM (SELECT id FROM posts LIMIT 5) AS recent_posts;   -- ✓

Quick Reference

Pattern Use For
WHERE col = (SELECT ...) Compare to computed single value
WHERE col IN (SELECT ...) Match any value from a set
WHERE EXISTS (SELECT 1 FROM ... WHERE ...) Check if related row exists
WHERE NOT EXISTS (...) Check if no related row exists (safe with NULLs)
FROM (SELECT ...) AS alias Use aggregated/filtered result as a table
SELECT (SELECT ... WHERE id = outer.id) Correlated scalar value per row (avoid for large sets)

🧠 Test Yourself

You want to find all posts that have NO approved comments. Which is the safest and most correct approach?