Aggregate Functions and GROUP BY — Counting, Summing and Grouping

Aggregate functions collapse multiple rows into a single summary value — essential for analytics, dashboards, and reporting. The GROUP BY clause divides rows into groups before aggregation. A critical SQL rule: any column in the SELECT list that is not an aggregate function must appear in the GROUP BY clause. HAVING filters groups after aggregation (like WHERE filters rows before aggregation). Understanding this distinction prevents one of the most common SQL errors.

Aggregate Functions and GROUP BY

-- ── COUNT variants ────────────────────────────────────────────────────────
SELECT COUNT(*)          AS TotalPosts,      -- counts all rows including NULLs
       COUNT(PublishedAt) AS PublishedPosts,  -- counts non-NULL PublishedAt values
       COUNT(DISTINCT AuthorId) AS UniqueAuthors   -- counts distinct non-NULL values
FROM   dbo.Posts;

-- ── SUM, AVG, MIN, MAX ────────────────────────────────────────────────────
SELECT
    AVG(ViewCount * 1.0) AS AvgViews,   -- * 1.0 forces decimal division
    SUM(ViewCount)       AS TotalViews,
    MIN(ViewCount)       AS MinViews,
    MAX(ViewCount)       AS MaxViews
FROM dbo.Posts
WHERE IsPublished = 1;

-- ── GROUP BY — post count and avg views by category ──────────────────────
SELECT
    c.Name          AS CategoryName,
    COUNT(p.Id)     AS PostCount,
    AVG(p.ViewCount * 1.0) AS AvgViews,
    SUM(p.ViewCount)       AS TotalViews
FROM dbo.Categories c
LEFT JOIN dbo.Posts p ON p.CategoryId = c.Id
                     AND p.IsPublished = 1
GROUP BY c.Id, c.Name          -- all non-aggregate SELECT columns must be here
ORDER BY PostCount DESC;

-- ── HAVING — filter groups (not rows) ────────────────────────────────────
-- Authors with more than 5 published posts
SELECT
    u.DisplayName,
    COUNT(p.Id) AS PublishedPostCount,
    SUM(p.ViewCount) AS TotalViews
FROM   dbo.Users u
JOIN   dbo.Posts p ON p.AuthorId = u.Id
WHERE  p.IsPublished = 1          -- WHERE filters rows BEFORE grouping
GROUP  BY u.Id, u.DisplayName
HAVING COUNT(p.Id) > 5            -- HAVING filters groups AFTER aggregation
ORDER  BY PublishedPostCount DESC;

-- ── Most popular tags (aggregate over junction table) ─────────────────────
SELECT
    t.Name         AS TagName,
    t.Slug         AS TagSlug,
    COUNT(pt.PostId) AS PostCount
FROM dbo.Tags t
LEFT JOIN dbo.PostTags pt ON pt.TagId = t.Id
LEFT JOIN dbo.Posts    p  ON p.Id     = pt.PostId AND p.IsPublished = 1
GROUP BY t.Id, t.Name, t.Slug
HAVING COUNT(pt.PostId) > 0
ORDER BY PostCount DESC;

-- ── STRING_AGG — aggregate strings into a comma-separated list ────────────
-- Get all tags for each post in one query
SELECT
    p.Id,
    p.Title,
    STRING_AGG(t.Name, ', ') WITHIN GROUP (ORDER BY t.Name) AS Tags
FROM dbo.Posts p
LEFT JOIN dbo.PostTags pt ON pt.PostId = p.Id
LEFT JOIN dbo.Tags     t  ON t.Id      = pt.TagId
WHERE p.IsPublished = 1
GROUP BY p.Id, p.Title
ORDER BY p.PublishedAt DESC;
Note: AVG(ViewCount) on an integer column performs integer division — if the average is 7.6, the result is 7. To get decimal precision, cast to a decimal type first: AVG(ViewCount * 1.0) or AVG(CAST(ViewCount AS DECIMAL(10,2))). This is a silent precision loss bug that affects any aggregate on integer columns where the fractional part matters. Always check whether your aggregate needs fractional precision and cast appropriately.
Tip: STRING_AGG(column, separator) WITHIN GROUP (ORDER BY ...) (SQL Server 2017+) is the clean way to aggregate multiple values into a delimited string — replacing the old FOR XML PATH('') hack. Use it to return a post’s tags as a comma-separated string in a single query without a separate join per tag. The WITHIN GROUP (ORDER BY t.Name) sorts the aggregated values, producing consistent output like “angular, csharp, dotnet” rather than random order.
Warning: The difference between WHERE and HAVING matters for performance. WHERE filters rows before grouping — SQL Server reads fewer rows, potentially using indexes. HAVING filters after grouping — SQL Server must aggregate all groups before applying the filter. Put as much filtering in WHERE as possible; use HAVING only for conditions on aggregate values (like HAVING COUNT(*) > 5) that cannot be expressed in WHERE.

Common Mistakes

Mistake 1 — Non-aggregate column in SELECT not in GROUP BY (syntax error)

❌ Wrong — SELECT AuthorId, Email, COUNT(*) FROM Posts GROUP BY AuthorId; Email not in GROUP BY.

✅ Correct — GROUP BY AuthorId, Email or use a subquery/JOIN to get Email separately.

Mistake 2 — Using WHERE on aggregate values (should be HAVING)

❌ Wrong — WHERE COUNT(*) > 5; aggregate functions are not allowed in WHERE clauses.

✅ Correct — HAVING COUNT(*) > 5; HAVING filters after aggregation.

🧠 Test Yourself

A query uses WHERE IsPublished = 1 and HAVING COUNT(*) > 3. In what order does SQL Server apply these filters?