GROUP BY and HAVING — Grouping and Filtering Aggregates

GROUP BY divides a table’s rows into groups based on shared values in one or more columns, and then applies aggregate functions to each group independently. Instead of one aggregate across all rows, you get one aggregate per group — the count of posts per author, the average rating per category, the total revenue per month. HAVING filters those groups after aggregation, which is distinct from WHERE which filters individual rows before aggregation. These two clauses together power all the per-entity summary queries that appear in dashboards, admin interfaces, and reporting endpoints.

GROUP BY

-- Count posts per author
SELECT author_id, COUNT(*) AS post_count
FROM posts
GROUP BY author_id
ORDER BY post_count DESC;

-- Average view count per status
SELECT status, ROUND(AVG(view_count), 0) AS avg_views
FROM posts
GROUP BY status;

-- Multiple GROUP BY columns
SELECT
    author_id,
    status,
    COUNT(*)            AS count,
    SUM(view_count)     AS total_views
FROM posts
GROUP BY author_id, status
ORDER BY author_id, status;

-- GROUP BY with JOIN (get author name alongside stats)
SELECT
    u.name          AS author_name,
    COUNT(p.id)     AS post_count,
    SUM(p.view_count) AS total_views
FROM users u
LEFT JOIN posts p ON u.id = p.author_id
GROUP BY u.id, u.name
ORDER BY post_count DESC;

-- Count posts per month
SELECT
    DATE_TRUNC('month', created_at) AS month,
    COUNT(*) AS posts_this_month
FROM posts
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;
Note: Every column in the SELECT list that is not inside an aggregate function must appear in the GROUP BY clause. If you write SELECT author_id, title, COUNT(*) with GROUP BY author_id, PostgreSQL raises an error — title is not in GROUP BY and not inside an aggregate, so it is ambiguous which title to return for each author group. The rule: SELECT columns are either aggregated (inside COUNT, SUM, etc.) or grouped (in GROUP BY).
Tip: When joining tables before grouping, be careful about which table you are counting. COUNT(*) counts joined rows, which can produce unexpected results with LEFT JOINs. If you LEFT JOIN users to posts and a user has no posts, the join produces one row with NULL post values — COUNT(*) counts this as 1. Use COUNT(p.id) to count only rows where there is an actual post — NULL post IDs are excluded, giving you 0 for users with no posts.
Warning: GROUP BY uses all values of the grouping column, including NULL. All rows with NULL in the GROUP BY column are grouped together into a single NULL group. This means if some posts have a NULL author_id, they form their own group in the results. Decide whether you want to include this NULL group in your results or filter it out with WHERE author_id IS NOT NULL before grouping.

HAVING — Filter Groups

-- HAVING filters GROUPS (after aggregation)
-- WHERE filters ROWS (before aggregation)

-- Authors with more than 5 posts
SELECT author_id, COUNT(*) AS post_count
FROM posts
GROUP BY author_id
HAVING COUNT(*) > 5
ORDER BY post_count DESC;

-- Tags used on at least 10 posts
SELECT t.name, COUNT(pt.post_id) AS usage_count
FROM tags t
JOIN post_tags pt ON t.id = pt.tag_id
GROUP BY t.id, t.name
HAVING COUNT(pt.post_id) >= 10
ORDER BY usage_count DESC;

-- WHERE + HAVING together (WHERE filters rows, HAVING filters groups)
SELECT author_id, COUNT(*) AS published_count
FROM posts
WHERE status = 'published'      -- filter rows BEFORE grouping
GROUP BY author_id
HAVING COUNT(*) > 3             -- filter groups AFTER aggregation
ORDER BY published_count DESC;

-- Practical: find posts above average view count
SELECT id, title, view_count
FROM posts
WHERE view_count > (SELECT AVG(view_count) FROM posts)
ORDER BY view_count DESC;

ROLLUP and CUBE — Subtotals

-- ROLLUP: generates subtotals and a grand total
SELECT
    COALESCE(status, 'ALL STATUSES') AS status,
    COALESCE(author_id::TEXT, 'ALL AUTHORS') AS author,
    COUNT(*) AS count
FROM posts
GROUP BY ROLLUP(status, author_id)
ORDER BY status NULLS LAST, author_id NULLS LAST;
-- Produces rows for: each (status, author) pair
--                    each status total (author_id = NULL → all authors)
--                    grand total (status = NULL, author_id = NULL)

-- CUBE: generates all possible combinations of subtotals
-- For reports that need both row and column totals

Common Mistakes

Mistake 1 — SELECT column not in GROUP BY and not aggregated

❌ Wrong — PostgreSQL raises an error:

SELECT author_id, title, COUNT(*)   -- title is not in GROUP BY!
FROM posts
GROUP BY author_id;
-- ERROR: column "posts.title" must appear in the GROUP BY clause

✅ Correct — aggregate title or add it to GROUP BY:

SELECT author_id, COUNT(*), MAX(title) AS latest_title
FROM posts
GROUP BY author_id;   -- ✓ title is inside MAX aggregate

Mistake 2 — Using WHERE instead of HAVING to filter aggregates

❌ Wrong — cannot reference aggregate in WHERE:

SELECT author_id, COUNT(*) AS cnt
FROM posts
WHERE COUNT(*) > 5   -- ERROR: aggregate not allowed in WHERE clause
GROUP BY author_id;

✅ Correct — use HAVING for aggregate conditions:

SELECT author_id, COUNT(*) AS cnt
FROM posts
GROUP BY author_id
HAVING COUNT(*) > 5;   -- ✓

Mistake 3 — COUNT(*) vs COUNT(column) in LEFT JOINs

❌ Wrong — counts NULL rows from LEFT JOIN as 1:

SELECT u.name, COUNT(*) FROM users u LEFT JOIN posts p ON u.id = p.author_id
GROUP BY u.id;   -- user with 0 posts shows 1, not 0!

✅ Correct — COUNT the joined table’s column:

SELECT u.name, COUNT(p.id) FROM users u LEFT JOIN posts p ON u.id = p.author_id
GROUP BY u.id;   -- ✓ user with 0 posts shows 0

Quick Reference

Clause Filters When evaluated
WHERE Individual rows Before GROUP BY
GROUP BY col Groups rows by col value After WHERE
HAVING Groups (aggregated) After GROUP BY
ORDER BY After HAVING/SELECT

🧠 Test Yourself

You want authors who have published at least 3 posts. The posts table has a status column. Which query is correct?