Dict, Set and Generator Comprehensions

The comprehension pattern extends beyond lists to three other powerful constructs: dict comprehensions for building dictionaries from iterables, set comprehensions for creating sets with automatic deduplication, and generator expressions for producing values one at a time without building a collection in memory. Dict comprehensions are particularly valuable in FastAPI for transforming database query results into response-ready dictionaries, set comprehensions for extracting unique tags or roles, and generator expressions for processing large datasets efficiently without loading everything into RAM.

Dict Comprehensions

# Syntax: {key_expr: value_expr for item in iterable if condition}

# โ”€โ”€ Build a dict from two lists โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
ids    = [1, 2, 3]
names  = ["Alice", "Bob", "Charlie"]

user_map = {uid: name for uid, name in zip(ids, names)}
# {1: "Alice", 2: "Bob", 3: "Charlie"}

# โ”€โ”€ Transform a dict โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
prices = {"apple": 1.20, "banana": 0.50, "cherry": 2.00}

# Apply 10% discount
discounted = {item: round(price * 0.9, 2) for item, price in prices.items()}
# {"apple": 1.08, "banana": 0.45, "cherry": 1.80}

# โ”€โ”€ Filter a dict โ€” keep only expensive items โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
expensive = {item: price for item, price in prices.items() if price >= 1.00}
# {"apple": 1.20, "cherry": 2.00}

# โ”€โ”€ Invert a dict (swap keys and values) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
code_to_name = {"PY": "Python", "JS": "JavaScript", "TS": "TypeScript"}
name_to_code = {name: code for code, name in code_to_name.items()}
# {"Python": "PY", "JavaScript": "JS", "TypeScript": "TS"}

# โ”€โ”€ FastAPI: shape database results into API response โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
db_rows = [
    {"user_id": 1, "full_name": "Alice Smith",  "email": "alice@example.com"},
    {"user_id": 2, "full_name": "Bob Jones",    "email": "bob@example.com"},
]

# Create a lookup dict: user_id โ†’ name (for quick access)
id_to_name = {row["user_id"]: row["full_name"] for row in db_rows}
# {1: "Alice Smith", 2: "Bob Jones"}

# Rename and select fields for the response
response_users = [
    {"id": row["user_id"], "name": row["full_name"], "email": row["email"]}
    for row in db_rows
]
Note: Dict comprehensions require a colon separating the key and value expressions: {key: value for ...}. If the keys are not unique, the last value wins โ€” later entries overwrite earlier ones with the same key. This is equivalent to building the dict with a for loop and assigning d[key] = value. Deliberately using this overwrite behaviour is a clean way to deduplicate by a specific field.
Tip: The {**existing_dict, "new_key": "new_value"} syntax (dictionary unpacking) is often more readable than a comprehension when you want to add or override a single field in a copy of a dict. Use comprehensions for systematic transformations of all keys/values, and unpacking for targeted additions or overrides to specific dicts โ€” both patterns appear frequently in FastAPI response shaping.
Warning: A generator expression uses parentheses () not square brackets []. The difference is critical: a list comprehension [x**2 for x in range(1000000)] creates a list of a million numbers in memory immediately. A generator expression (x**2 for x in range(1000000)) creates an object that produces numbers one at a time on demand. For functions that accept iterables (like sum(), max(), join()), always prefer the generator expression form.

Set Comprehensions

# Syntax: {expression for item in iterable if condition}
# Same as list comprehension but uses {} and produces a set (unique values)

posts = [
    {"id": 1, "tags": ["python", "fastapi"]},
    {"id": 2, "tags": ["python", "postgresql"]},
    {"id": 3, "tags": ["fastapi", "react"]},
]

# All unique tags across all posts
all_tags = {tag for post in posts for tag in post["tags"]}
# {"python", "fastapi", "postgresql", "react"} โ€” no duplicates

# Unique author IDs from comments
comments = [
    {"author_id": 1, "text": "Great!"},
    {"author_id": 2, "text": "Thanks!"},
    {"author_id": 1, "text": "Agreed!"},
]
unique_author_ids = {c["author_id"] for c in comments}
# {1, 2} โ€” automatically deduplicated

# Sets support fast membership testing โ€” O(1) vs O(n) for lists
allowed_roles = {"user", "editor", "admin"}

def is_valid_role(role: str) -> bool:
    return role in allowed_roles   # O(1) lookup in set

# Deduplication pattern: list โ†’ set โ†’ list (preserves no order guarantee)
raw_tags = ["python", "fastapi", "python", "react", "fastapi"]
unique_tags = list({tag.lower() for tag in raw_tags})
# ["python", "fastapi", "react"] (any order โ€” sets are unordered)

Generator Expressions

# Syntax: (expression for item in iterable if condition)
# Like a list comprehension but lazy โ€” produces values one at a time

# List comprehension โ€” builds ALL results in memory first
squares_list = [x ** 2 for x in range(1_000_000)]   # ~8MB in memory

# Generator expression โ€” produces one value at a time (almost no memory)
squares_gen  = (x ** 2 for x in range(1_000_000))   # tiny object

# Use generators with functions that consume iterables
total   = sum(x ** 2 for x in range(100))         # sum without a list
maximum = max(len(name) for name in ["Alice", "Bob", "Charlie"])  # 7
joined  = ", ".join(str(x) for x in [1, 2, 3])   # "1, 2, 3"

# When a function already takes a generator, no extra () needed
# sum((x for x in range(10))) is the same as:
sum(x for x in range(10))   # double () not needed โ€” generator inferred

# Chaining generators โ€” process data pipeline without intermediary lists
with open("data.txt") as f:
    line_count = sum(1 for line in f if line.strip())
# Counts non-empty lines without loading the file into memory

# FastAPI: process large query results efficiently
def get_published_titles(db_cursor):
    # Generator โ€” does not load all rows into memory at once
    return (row["title"] for row in db_cursor if row["published"])

Choosing the Right Comprehension Type

You need… Use Syntax
A list of transformed items List comprehension [expr for x in it]
A dict from pairs or transformed dict Dict comprehension {k: v for x in it}
Unique values, fast membership check Set comprehension {expr for x in it}
Memory-efficient processing / passed to sum/max/join Generator expression (expr for x in it)
Side effects (printing, writing) or complex logic Regular for loop for x in it: ...

Common Mistakes

Mistake 1 โ€” Forgetting that {} without colons makes a set, not a dict

โŒ Wrong โ€” expecting a dict from value-only comprehension:

result = {x for x in [1, 2, 3]}
type(result)   # <class 'set'> โ€” not a dict!

โœ… Correct โ€” dict comprehensions must have key: value:

result = {x: x**2 for x in [1, 2, 3]}
type(result)   # <class 'dict'> โœ“  โ†’  {1: 1, 2: 4, 3: 9}

Mistake 2 โ€” Using list comprehension where generator is appropriate

โŒ Wrong โ€” builds entire list just to compute a sum:

total = sum([x ** 2 for x in range(1_000_000)])   # wastes 8MB of memory

โœ… Correct โ€” use generator expression with sum():

total = sum(x ** 2 for x in range(1_000_000))   # โœ“ tiny memory footprint

Mistake 3 โ€” Trying to index a generator like a list

โŒ Wrong โ€” generators do not support indexing:

gen = (x for x in range(10))
gen[0]   # TypeError: 'generator' object is not subscriptable

โœ… Correct โ€” convert to list first if indexing is needed:

items = list(x for x in range(10))
items[0]   # 0 โœ“

Quick Reference

Type Syntax Result Ordered? Unique?
List comp [expr for x in it] list Yes No
Dict comp {k: v for x in it} dict Yes (3.7+) Keys only
Set comp {expr for x in it} set No Yes
Generator (expr for x in it) generator Yes (lazy) No

🧠 Test Yourself

You have a list of user dicts from a database query. You want to build a lookup dict mapping user_id โ†’ email for fast access. Which is the cleanest approach?