Subqueries embed one SELECT inside another — in WHERE (for filtering), in FROM (as a derived table), or in SELECT (as a scalar value). EXISTS and NOT EXISTS are the most efficient way to check for related data existence — they stop scanning as soon as the first match is found and handle NULLs correctly, unlike IN with a subquery. Correlated subqueries reference the outer query but execute once per outer row — powerful but potentially slow on large tables.
Subqueries and EXISTS
-- ── Subquery in WHERE with IN ─────────────────────────────────────────────
-- Posts by authors who have more than 10 published posts
SELECT Id, Title
FROM dbo.Posts
WHERE AuthorId IN (
SELECT AuthorId
FROM dbo.Posts
WHERE IsPublished = 1
GROUP BY AuthorId
HAVING COUNT(*) > 10
);
-- ── EXISTS — check existence without retrieving rows ──────────────────────
-- Posts that have at least one comment (EXISTS is more efficient than IN)
SELECT p.Id, p.Title
FROM dbo.Posts p
WHERE EXISTS (
SELECT 1 -- the SELECT list doesn't matter in EXISTS — only existence checked
FROM dbo.Comments c
WHERE c.PostId = p.Id -- correlated: references outer query's p.Id
);
-- ── NOT EXISTS — posts with NO comments ──────────────────────────────────
SELECT p.Id, p.Title, p.PublishedAt
FROM dbo.Posts p
WHERE p.IsPublished = 1
AND NOT EXISTS (
SELECT 1
FROM dbo.Comments c
WHERE c.PostId = p.Id
);
-- ── Scalar subquery — single value in SELECT ──────────────────────────────
SELECT
p.Id,
p.Title,
(SELECT COUNT(*)
FROM dbo.Comments c
WHERE c.PostId = p.Id
AND c.IsApproved = 1) AS ApprovedCommentCount -- runs once per post row
FROM dbo.Posts p
WHERE p.IsPublished = 1;
-- ⚠ Correlated scalar subquery: runs N times for N posts — use JOIN + GROUP BY instead
-- ── Derived table — subquery in FROM ──────────────────────────────────────
-- Top-viewed posts per category (correlated subquery would be slow)
SELECT c.Name AS CategoryName, top_posts.Title, top_posts.ViewCount
FROM dbo.Categories c
JOIN (
SELECT CategoryId, Title, ViewCount,
ROW_NUMBER() OVER (PARTITION BY CategoryId ORDER BY ViewCount DESC) AS rn
FROM dbo.Posts
WHERE IsPublished = 1
) AS top_posts ON top_posts.CategoryId = c.Id
AND top_posts.rn = 1; -- only the top post per category
-- ── IN vs EXISTS NULL behaviour ───────────────────────────────────────────
-- If the subquery returns any NULLs, IN produces unexpected results:
-- x IN (1, NULL, 3) → TRUE if x=1 or x=3, but NOT FALSE if x=5
-- It evaluates as NULL (UNKNOWN) for non-matching values
-- NOT IN (1, NULL, 3) → Always NULL/UNKNOWN (NEVER returns TRUE)
-- EXISTS never has this problem — always returns TRUE or FALSE
-- ✅ Prefer EXISTS/NOT EXISTS over IN/NOT IN when NULLs are possible
NOT IN with a subquery is dangerous when the subquery can return NULLs. If any row in the subquery result has a NULL value, NOT IN returns UNKNOWN (not TRUE or FALSE) for every comparison — meaning the outer query returns zero rows even when you expect results. This is one of the most surprising SQL gotchas. Always prefer NOT EXISTS over NOT IN when the subquery involves columns that can be NULL. EXISTS and NOT EXISTS are NULL-safe and always return a clear TRUE or FALSE.SELECT COUNT(*) FROM Comments WHERE PostId = p.Id in the SELECT list) executes once per outer row — for 1,000 posts, that is 1,000 separate subquery executions. A JOIN with GROUP BY executes once. The exception: when the subquery returns different types of data for different conditions and a JOIN would be complex — sometimes a correlated subquery is more readable even if slightly slower for small datasets.Common Mistakes
Mistake 1 — NOT IN when subquery can return NULLs (returns no rows)
❌ Wrong — WHERE Id NOT IN (SELECT ParentId FROM Categories); ParentId has NULLs; returns zero rows.
✅ Correct — WHERE NOT EXISTS (SELECT 1 FROM Categories c WHERE c.ParentId = x.Id).
Mistake 2 — Correlated scalar subquery per row instead of JOIN (N+1 query in SQL)
❌ Wrong — SELECT (SELECT COUNT(*) FROM Comments WHERE PostId = p.Id) FROM Posts; executes N subqueries.
✅ Correct — JOIN (SELECT PostId, COUNT(*) AS cnt FROM Comments GROUP BY PostId) c ON c.PostId = p.Id.