The collections Module and Pythonic Data Patterns

📚 React + FastAPI 📂 Chapter 4: Collections — Lists, Tuples, Dicts and Sets 📄 Lesson 4050 Beginner 🕒 December 17, 2025

Python’s collections module provides specialised container types that extend the built-in dict, list, and tuple with additional functionality. These are not niche tools — defaultdict, Counter, deque, and namedtuple appear regularly in production FastAPI code for grouping query results, counting occurrences, implementing queues, and representing lightweight data records. Understanding nested data structure patterns — lists of dicts, dicts of lists, and the transformations between them — is also essential for shaping the complex JSON responses that modern APIs return.

defaultdict — Never Get a KeyError on Missing Keys

from collections import defaultdict

# defaultdict(factory) — creates default value on first access
# The factory is called with no arguments to produce the default

# Default value: list (for grouping)
posts_by_tag = defaultdict(list)

posts = [
    {"id": 1, "title": "Intro",    "tags": ["python", "fastapi"]},
    {"id": 2, "title": "Models",   "tags": ["python", "sqlalchemy"]},
    {"id": 3, "title": "Routing",  "tags": ["fastapi"]},
]

for post in posts:
    for tag in post["tags"]:
        posts_by_tag[tag].append(post["id"])
        # First time a tag is seen: defaultdict creates [] automatically
        # No KeyError, no manual "if tag not in d: d[tag] = []"

print(dict(posts_by_tag))
# {"python": [1, 2], "fastapi": [1, 3], "sqlalchemy": [2]}

# Default value: int (for counting — though Counter is better)
word_count = defaultdict(int)   # default is 0
for word in "the cat sat on the mat".split():
    word_count[word] += 1       # first access initialises to 0
print(dict(word_count))
# {"the": 2, "cat": 1, "sat": 1, "on": 1, "mat": 1}

# Default value: set (for unique grouping)
user_roles = defaultdict(set)
user_roles["alice"].add("editor")
user_roles["alice"].add("admin")
user_roles["bob"].add("user")
print(dict(user_roles))
# {"alice": {"editor", "admin"}, "bob": {"user"}}

Note: A defaultdict behaves exactly like a regular dict — all the same methods work (get, update, items, etc.). The only difference is what happens when you access a key that does not exist: a regular dict raises KeyError, while a defaultdict calls its factory function, stores the result, and returns it. You can convert a defaultdict back to a regular dict with dict(my_defaultdict) for clean serialisation.

Tip: defaultdict eliminates the “check then insert” pattern that clutters grouping code. Instead of if key not in d: d[key] = []; d[key].append(item), you write just d[key].append(item). This is particularly clean in FastAPI when you need to group database results by a foreign key — for example, grouping comments by their post ID from a flat list of comment rows.

Warning: defaultdict creates an entry for any key you access, even if you only read it. This means key in my_defaultdict is safe, but my_defaultdict[nonexistent_key] creates that key with the default value — even if you only intended to read. Use my_defaultdict.get(key) for read-only access that does not create entries, or check with in first.

Counter — Count Occurrences

from collections import Counter

# Count items in an iterable
tags = ["python", "fastapi", "python", "react", "fastapi", "python"]
tag_counts = Counter(tags)
print(tag_counts)
# Counter({"python": 3, "fastapi": 2, "react": 1})

# Most common
tag_counts.most_common(2)   # [("python", 3), ("fastapi", 2)]
tag_counts.most_common(1)   # [("python", 3)] — top 1

# Access like a dict — returns 0 for missing (not KeyError)
tag_counts["python"]         # 3
tag_counts["unknown"]        # 0 — no KeyError

# Arithmetic on counters
a = Counter({"a": 3, "b": 2, "c": 1})
b = Counter({"a": 1, "b": 4, "d": 2})
print(a + b)   # Counter({"b": 6, "a": 4, "d": 2, "c": 1}) — sum counts
print(a - b)   # Counter({"a": 2, "c": 1}) — subtract, drop negatives
print(a & b)   # Counter({"a": 1, "b": 2}) — min of each count
print(a | b)   # Counter({"b": 4, "a": 3, "d": 2, "c": 1}) — max of each

# FastAPI use: trending tags over last 7 days
def get_trending_tags(recent_posts: list, top_n: int = 10) -> list:
    all_tags = [tag for post in recent_posts for tag in post["tags"]]
    return [tag for tag, _ in Counter(all_tags).most_common(top_n)]

deque — Efficient Double-Ended Queue

from collections import deque

# deque supports O(1) append/pop from BOTH ends
# Lists: O(1) append/pop at end, O(n) at front
queue = deque()

# Add to right (standard queue behaviour)
queue.append("task_1")
queue.append("task_2")
queue.append("task_3")

# Remove from left (FIFO — first in, first out)
first = queue.popleft()   # "task_1"

# Add to left
queue.appendleft("priority_task")

# deque as a fixed-size sliding window (maxlen)
recent = deque(maxlen=5)   # automatically drops oldest when full
for i in range(10):
    recent.append(i)
print(list(recent))   # [5, 6, 7, 8, 9] — keeps last 5

# FastAPI use: rate limiting (last N request timestamps)
import time
request_times = deque(maxlen=100)   # keep last 100 requests

def is_rate_limited(max_per_minute: int = 60) -> bool:
    now = time.time()
    request_times.append(now)
    one_minute_ago = now - 60
    recent_count = sum(1 for t in request_times if t > one_minute_ago)
    return recent_count > max_per_minute

Nested Data Structure Patterns

# ── List of dicts — the standard DB query result shape ────────────────────────
posts = [
    {"id": 1, "title": "A", "author_id": 10, "tags": ["python"]},
    {"id": 2, "title": "B", "author_id": 20, "tags": ["fastapi", "python"]},
    {"id": 3, "title": "C", "author_id": 10, "tags": ["react"]},
]

# Group by author_id
by_author = defaultdict(list)
for post in posts:
    by_author[post["author_id"]].append(post)

# Filter and transform
published_titles = [p["title"] for p in posts if p.get("published", True)]

# Build an ID → post lookup dict
post_by_id = {p["id"]: p for p in posts}
print(post_by_id[2]["title"])   # "B"

# ── Dict of lists — configuration and grouping ─────────────────────────────────
ROLE_PERMISSIONS = {
    "admin":  ["read", "write", "delete", "manage_users"],
    "editor": ["read", "write"],
    "user":   ["read"],
}

def get_permissions(role: str) -> list:
    return ROLE_PERMISSIONS.get(role, [])

# ── Merge nested structures ────────────────────────────────────────────────────
base_config = {"db": {"host": "localhost", "port": 5432}}
env_config  = {"db": {"host": "prod-db.example.com"}}

# Naive merge loses nested keys:
wrong = {**base_config, **env_config}
# {"db": {"host": "prod-db.example.com"}} — port is lost!

# Deep merge (manual for two levels):
def merge_config(base: dict, override: dict) -> dict:
    result = base.copy()
    for key, value in override.items():
        if key in result and isinstance(result[key], dict) and isinstance(value, dict):
            result[key] = merge_config(result[key], value)
        else:
            result[key] = value
    return result

merged = merge_config(base_config, env_config)
# {"db": {"host": "prod-db.example.com", "port": 5432}} ✓

Common Mistakes

Mistake 1 — Accessing defaultdict for reads creates entries

❌ Wrong — checking a value creates an empty entry:

d = defaultdict(list)
print(d["missing"])   # [] — but now "missing" key EXISTS in d!
print("missing" in d)  # True — unexpected!

✅ Correct — use .get() for reads that should not create entries:

print(d.get("missing"))   # None — key NOT created ✓
print("missing" in d)     # False ✓

Mistake 2 — Using list.insert(0, x) for a queue instead of deque

❌ Wrong — O(n) insertion at the front of a list:

queue = []
queue.insert(0, item)   # O(n) — shifts all elements

✅ Correct — use deque for O(1) both-end operations:

queue = deque()
queue.appendleft(item)   # O(1) ✓

Mistake 3 — Shallow merge losing nested dict keys

❌ Wrong — second dict overwrites entire nested dict:

config = {**base, **override}   # nested dicts are replaced, not merged

✅ Correct — use a recursive merge function for nested dicts (shown above).

Quick Reference

Tool	Import	Best For
`defaultdict(list)`	`from collections import defaultdict`	Grouping items by key
`defaultdict(int)`	same	Counting (simple)
`Counter(iterable)`	`from collections import Counter`	Frequency counting, most common
`deque(maxlen=N)`	`from collections import deque`	Sliding window, queues
`namedtuple`	`from collections import namedtuple`	Readable lightweight records
`{id: item for item in items}`	built-in	ID → object lookup dict
`defaultdict(list)` + loop	built-in	Group-by query result

🧠 Test Yourself

You have a list of comment dicts, each with a post_id field. You want to build a dict where each key is a post_id and each value is a list of all comments for that post. What is the most Pythonic approach?

Loop over comments; for each, check if post_id is in the dict; if not, create an empty list; then append
from collections import defaultdict; by_post = defaultdict(list); [by_post[c["post_id"]].append(c) for c in comments] — defaultdict(list) automatically initialises an empty list on first access for each new post_id, eliminating the manual “if key not in dict” check. The result is a clean grouping without boilerplate
Use itertools.groupby() after sorting comments by post_id
{c["post_id"]: c for c in comments} — this only keeps the last comment per post_id, not all of them

defaultdict — Never Get a KeyError on Missing Keys #

Counter — Count Occurrences #

deque — Efficient Double-Ended Queue #

Nested Data Structure Patterns #

Common Mistakes #

Mistake 1 — Accessing defaultdict for reads creates entries #

Mistake 2 — Using list.insert(0, x) for a queue instead of deque #

Mistake 3 — Shallow merge losing nested dict keys #

Quick Reference #

🧠 Test Yourself #

📚 More in this Tutorial Series

defaultdict — Never Get a KeyError on Missing Keys

Counter — Count Occurrences

deque — Efficient Double-Ended Queue

Nested Data Structure Patterns

Common Mistakes

Mistake 1 — Accessing defaultdict for reads creates entries

Mistake 2 — Using list.insert(0, x) for a queue instead of deque

Mistake 3 — Shallow merge losing nested dict keys

Quick Reference

🧠 Test Yourself