The collections Module and Pythonic Data Patterns

Python’s collections module provides specialised container types that extend the built-in dict, list, and tuple with additional functionality. These are not niche tools — defaultdict, Counter, deque, and namedtuple appear regularly in production FastAPI code for grouping query results, counting occurrences, implementing queues, and representing lightweight data records. Understanding nested data structure patterns — lists of dicts, dicts of lists, and the transformations between them — is also essential for shaping the complex JSON responses that modern APIs return.

defaultdict — Never Get a KeyError on Missing Keys

from collections import defaultdict

# defaultdict(factory) — creates default value on first access
# The factory is called with no arguments to produce the default

# Default value: list (for grouping)
posts_by_tag = defaultdict(list)

posts = [
    {"id": 1, "title": "Intro",    "tags": ["python", "fastapi"]},
    {"id": 2, "title": "Models",   "tags": ["python", "sqlalchemy"]},
    {"id": 3, "title": "Routing",  "tags": ["fastapi"]},
]

for post in posts:
    for tag in post["tags"]:
        posts_by_tag[tag].append(post["id"])
        # First time a tag is seen: defaultdict creates [] automatically
        # No KeyError, no manual "if tag not in d: d[tag] = []"

print(dict(posts_by_tag))
# {"python": [1, 2], "fastapi": [1, 3], "sqlalchemy": [2]}

# Default value: int (for counting — though Counter is better)
word_count = defaultdict(int)   # default is 0
for word in "the cat sat on the mat".split():
    word_count[word] += 1       # first access initialises to 0
print(dict(word_count))
# {"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1}

# Default value: set (for unique grouping)
user_roles = defaultdict(set)
user_roles["alice"].add("editor")
user_roles["alice"].add("admin")
user_roles["bob"].add("user")
print(dict(user_roles))
# {"alice": {"editor", "admin"}, "bob": {"user"}}
Note: A defaultdict behaves exactly like a regular dict — all the same methods work (get, update, items, etc.). The only difference is what happens when you access a key that does not exist: a regular dict raises KeyError, while a defaultdict calls its factory function, stores the result, and returns it. You can convert a defaultdict back to a regular dict with dict(my_defaultdict) for clean serialisation.
Tip: defaultdict eliminates the “check then insert” pattern that clutters grouping code. Instead of if key not in d: d[key] = []; d[key].append(item), you write just d[key].append(item). This is particularly clean in FastAPI when you need to group database results by a foreign key — for example, grouping comments by their post ID from a flat list of comment rows.
Warning: defaultdict creates an entry for any key you access, even if you only read it. This means key in my_defaultdict is safe, but my_defaultdict[nonexistent_key] creates that key with the default value — even if you only intended to read. Use my_defaultdict.get(key) for read-only access that does not create entries, or check with in first.

Counter — Count Occurrences

from collections import Counter

# Count items in an iterable
tags = ["python", "fastapi", "python", "react", "fastapi", "python"]
tag_counts = Counter(tags)
print(tag_counts)
# Counter({"python": 3, "fastapi": 2, "react": 1})

# Most common
tag_counts.most_common(2)   # [("python", 3), ("fastapi", 2)]
tag_counts.most_common(1)   # [("python", 3)] — top 1

# Access like a dict — returns 0 for missing (not KeyError)
tag_counts["python"]         # 3
tag_counts["unknown"]        # 0 — no KeyError

# Arithmetic on counters
a = Counter({"a": 3, "b": 2, "c": 1})
b = Counter({"a": 1, "b": 4, "d": 2})
print(a + b)   # Counter({"b": 6, "a": 4, "d": 2, "c": 1}) — sum counts
print(a - b)   # Counter({"a": 2, "c": 1}) — subtract, drop negatives
print(a & b)   # Counter({"a": 1, "b": 2}) — min of each count
print(a | b)   # Counter({"b": 4, "a": 3, "d": 2, "c": 1}) — max of each

# FastAPI use: trending tags over last 7 days
def get_trending_tags(recent_posts: list, top_n: int = 10) -> list:
    all_tags = [tag for post in recent_posts for tag in post["tags"]]
    return [tag for tag, _ in Counter(all_tags).most_common(top_n)]

deque — Efficient Double-Ended Queue

from collections import deque

# deque supports O(1) append/pop from BOTH ends
# Lists: O(1) append/pop at end, O(n) at front
queue = deque()

# Add to right (standard queue behaviour)
queue.append("task_1")
queue.append("task_2")
queue.append("task_3")

# Remove from left (FIFO — first in, first out)
first = queue.popleft()   # "task_1"

# Add to left
queue.appendleft("priority_task")

# deque as a fixed-size sliding window (maxlen)
recent = deque(maxlen=5)   # automatically drops oldest when full
for i in range(10):
    recent.append(i)
print(list(recent))   # [5, 6, 7, 8, 9] — keeps last 5

# FastAPI use: rate limiting (last N request timestamps)
import time
request_times = deque(maxlen=100)   # keep last 100 requests

def is_rate_limited(max_per_minute: int = 60) -> bool:
    now = time.time()
    request_times.append(now)
    one_minute_ago = now - 60
    recent_count = sum(1 for t in request_times if t > one_minute_ago)
    return recent_count > max_per_minute

Nested Data Structure Patterns

# ── List of dicts — the standard DB query result shape ────────────────────────
posts = [
    {"id": 1, "title": "A", "author_id": 10, "tags": ["python"]},
    {"id": 2, "title": "B", "author_id": 20, "tags": ["fastapi", "python"]},
    {"id": 3, "title": "C", "author_id": 10, "tags": ["react"]},
]

# Group by author_id
by_author = defaultdict(list)
for post in posts:
    by_author[post["author_id"]].append(post)

# Filter and transform
published_titles = [p["title"] for p in posts if p.get("published", True)]

# Build an ID → post lookup dict
post_by_id = {p["id"]: p for p in posts}
print(post_by_id[2]["title"])   # "B"

# ── Dict of lists — configuration and grouping ─────────────────────────────────
ROLE_PERMISSIONS = {
    "admin":  ["read", "write", "delete", "manage_users"],
    "editor": ["read", "write"],
    "user":   ["read"],
}

def get_permissions(role: str) -> list:
    return ROLE_PERMISSIONS.get(role, [])

# ── Merge nested structures ────────────────────────────────────────────────────
base_config = {"db": {"host": "localhost", "port": 5432}}
env_config  = {"db": {"host": "prod-db.example.com"}}

# Naive merge loses nested keys:
wrong = {**base_config, **env_config}
# {"db": {"host": "prod-db.example.com"}} — port is lost!

# Deep merge (manual for two levels):
def merge_config(base: dict, override: dict) -> dict:
    result = base.copy()
    for key, value in override.items():
        if key in result and isinstance(result[key], dict) and isinstance(value, dict):
            result[key] = merge_config(result[key], value)
        else:
            result[key] = value
    return result

merged = merge_config(base_config, env_config)
# {"db": {"host": "prod-db.example.com", "port": 5432}} ✓

Common Mistakes

Mistake 1 — Accessing defaultdict for reads creates entries

❌ Wrong — checking a value creates an empty entry:

d = defaultdict(list)
print(d["missing"])   # [] — but now "missing" key EXISTS in d!
print("missing" in d)  # True — unexpected!

✅ Correct — use .get() for reads that should not create entries:

print(d.get("missing"))   # None — key NOT created ✓
print("missing" in d)     # False ✓

Mistake 2 — Using list.insert(0, x) for a queue instead of deque

❌ Wrong — O(n) insertion at the front of a list:

queue = []
queue.insert(0, item)   # O(n) — shifts all elements

✅ Correct — use deque for O(1) both-end operations:

queue = deque()
queue.appendleft(item)   # O(1) ✓

Mistake 3 — Shallow merge losing nested dict keys

❌ Wrong — second dict overwrites entire nested dict:

config = {**base, **override}   # nested dicts are replaced, not merged

✅ Correct — use a recursive merge function for nested dicts (shown above).

Quick Reference

Tool Import Best For
defaultdict(list) from collections import defaultdict Grouping items by key
defaultdict(int) same Counting (simple)
Counter(iterable) from collections import Counter Frequency counting, most common
deque(maxlen=N) from collections import deque Sliding window, queues
namedtuple from collections import namedtuple Readable lightweight records
{id: item for item in items} built-in ID → object lookup dict
defaultdict(list) + loop built-in Group-by query result

🧠 Test Yourself

You have a list of comment dicts, each with a post_id field. You want to build a dict where each key is a post_id and each value is a list of all comments for that post. What is the most Pythonic approach?