Window Functions — Ranking, Running Totals and Peer Comparisons

Window functions compute a value for each row based on a group of related rows — the window — without collapsing those rows into a single aggregate result. Unlike GROUP BY which turns N rows into 1, a window function keeps all rows and adds a computed column. This enables powerful analytics that are difficult or impossible with GROUP BY: ranking posts by views within each category, computing running totals, finding the difference from the previous row, and calculating moving averages. In FastAPI, window functions are most useful for dashboard endpoints that return ranked or time-series data.

The OVER() Clause

-- Window functions use OVER() to define the window
-- OVER() with no arguments: the entire result set is the window

-- Rank all posts by view count (no grouping — all rows kept)
SELECT
    id,
    title,
    view_count,
    RANK()       OVER (ORDER BY view_count DESC)  AS rank,
    DENSE_RANK() OVER (ORDER BY view_count DESC)  AS dense_rank,
    ROW_NUMBER() OVER (ORDER BY view_count DESC)  AS row_num,
    PERCENT_RANK() OVER (ORDER BY view_count DESC) AS percentile
FROM posts
WHERE status = 'published';

-- RANK vs DENSE_RANK vs ROW_NUMBER:
-- If two posts have the same view_count (say 1000):
-- RANK():       1, 2, 2, 4  (gap after ties)
-- DENSE_RANK(): 1, 2, 2, 3  (no gap after ties)
-- ROW_NUMBER(): 1, 2, 3, 4  (no ties, arbitrary order for equals)
Note: Window functions are evaluated after WHERE, GROUP BY, and HAVING but before ORDER BY and LIMIT. This means you cannot use a window function in a WHERE clause — you cannot write WHERE RANK() OVER (...) <= 3. To filter by a window function result, wrap the query in a subquery or CTE: SELECT * FROM (SELECT ..., RANK() OVER (...) AS r FROM ...) sub WHERE r <= 3. This is one of the most common beginner mistakes with window functions.
Tip: PARTITION BY inside OVER() divides the window into groups — like GROUP BY but without collapsing rows. Ranking by view count within each author’s posts: RANK() OVER (PARTITION BY author_id ORDER BY view_count DESC) — each author gets their own ranking from 1. Without PARTITION BY, all rows compete in the same global ranking. Think of PARTITION BY as “reset the window for each group.”
Warning: Window functions can be expensive on large tables without appropriate indexes, because they often require sorting the full result set. PostgreSQL can use an index to avoid the sort step for window functions that use ORDER BY on an indexed column — but only if the query is structured to allow it. Always check EXPLAIN ANALYZE output when adding window functions to endpoints that serve high traffic.

PARTITION BY — Per-Group Windows

-- PARTITION BY: reset the window for each unique value
-- Rank posts within each author's own post list
SELECT
    p.id,
    p.title,
    p.view_count,
    u.name       AS author_name,
    RANK()       OVER (PARTITION BY p.author_id ORDER BY p.view_count DESC) AS author_rank,
    COUNT(*)     OVER (PARTITION BY p.author_id)  AS author_total_posts,
    SUM(p.view_count) OVER (PARTITION BY p.author_id) AS author_total_views
FROM posts p
JOIN users u ON p.author_id = u.id
WHERE p.status = 'published'
ORDER BY u.name, author_rank;

-- Get top-1 post per author (most-viewed)
SELECT *
FROM (
    SELECT
        p.id, p.title, p.view_count, p.author_id,
        ROW_NUMBER() OVER (PARTITION BY p.author_id ORDER BY p.view_count DESC) AS rn
    FROM posts p WHERE p.status = 'published'
) ranked
WHERE rn = 1   -- only the top post per author
ORDER BY view_count DESC;

Aggregate Window Functions

-- Running total of views (sorted by date)
SELECT
    id,
    title,
    view_count,
    created_at,
    SUM(view_count) OVER (ORDER BY created_at)           AS running_total_views,
    AVG(view_count) OVER (ORDER BY created_at
                          ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
                                                          AS moving_avg_7_posts,
    view_count - LAG(view_count)  OVER (ORDER BY created_at) AS change_from_prev,
    view_count - LEAD(view_count) OVER (ORDER BY created_at) AS change_to_next
FROM posts
WHERE status = 'published'
ORDER BY created_at;

-- LAG / LEAD: access values from previous/next rows
SELECT
    DATE_TRUNC('month', created_at)  AS month,
    COUNT(*)                          AS post_count,
    LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at))
                                      AS prev_month_count,
    COUNT(*) - LAG(COUNT(*)) OVER (ORDER BY DATE_TRUNC('month', created_at))
                                      AS month_over_month_change
FROM posts
WHERE status = 'published'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

Window Frame Specification

-- ROWS BETWEEN defines which rows are included in the frame
-- Default frame when ORDER BY present: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- Running total from start to current row
SUM(view_count) OVER (ORDER BY created_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

-- Moving average of last 7 rows (sliding window)
AVG(view_count) OVER (ORDER BY created_at ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

-- Centered moving average (3 rows before, current, 3 rows after)
AVG(view_count) OVER (ORDER BY created_at ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)

-- Entire partition (same result as aggregate without GROUP BY)
SUM(view_count) OVER (PARTITION BY author_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

-- NTILE: divide rows into N equal buckets
NTILE(4) OVER (ORDER BY view_count DESC) AS quartile
-- 1=top 25%, 2=next 25%, 3=next 25%, 4=bottom 25%

Common Mistakes

Mistake 1 — Using window function in WHERE clause

❌ Wrong — cannot filter on window function result directly:

SELECT * FROM posts WHERE RANK() OVER (ORDER BY view_count DESC) <= 5;
-- ERROR: window functions not allowed in WHERE

✅ Correct — wrap in subquery or CTE:

SELECT * FROM (
    SELECT *, RANK() OVER (ORDER BY view_count DESC) AS r FROM posts
) sub WHERE r <= 5;   -- ✓

Mistake 2 — Confusing RANK with ROW_NUMBER for top-N queries

❌ Wrong — RANK skips numbers after ties, so <= 3 may return 4+ rows:

WHERE RANK() OVER (PARTITION BY author_id ORDER BY views DESC) <= 1
-- If two posts tie for first: returns 2 rows per author (both have RANK=1)

✅ Correct — use ROW_NUMBER for strictly one row per partition:

WHERE ROW_NUMBER() OVER (PARTITION BY author_id ORDER BY views DESC) = 1   -- ✓

Mistake 3 — LAG/LEAD returning NULL at boundaries

❌ Wrong — first row’s LAG is NULL, causing NULL in change calculation:

view_count - LAG(view_count) OVER (ORDER BY created_at)   -- NULL for first row

✅ Correct — use LAG with a default:

view_count - LAG(view_count, 1, 0) OVER (ORDER BY created_at)   -- ✓ 0 for first row

Quick Reference

Function Description
ROW_NUMBER() Unique row number, no ties
RANK() Rank with gaps after ties
DENSE_RANK() Rank without gaps after ties
NTILE(n) Divide into n equal buckets
LAG(col, n, default) Value from n rows before
LEAD(col, n, default) Value from n rows ahead
SUM() OVER (...) Running/windowed total
AVG() OVER (...) Running/windowed average
FIRST_VALUE(col) First value in window frame
LAST_VALUE(col) Last value in window frame

🧠 Test Yourself

You want the single most-viewed post per author. Two authors each have two posts with identical view counts. Which window function gives exactly one row per author?