Generator expressions are the lazy equivalent of list comprehensions โ they use parentheses instead of square brackets and produce values one at a time rather than building an entire list in memory. When chained together, multiple generator expressions form a lazy pipeline where data flows through each stage without any intermediate collection being fully materialised. This is the most memory-efficient way to process large datasets in Python, and it is the pattern behind SQLAlchemy cursor iteration, file line processing, and FastAPI streaming responses. The performance difference between list and generator approaches becomes significant at scale.
Generator Expressions
# List comprehension โ creates the ENTIRE list in memory immediately
squares_list = [x ** 2 for x in range(1_000_000)] # ~8MB in memory
# Generator expression โ lazy, produces one value at a time
squares_gen = (x ** 2 for x in range(1_000_000)) # ~120 bytes (just the generator object)
# Same interface: both are iterable
for sq in squares_gen:
if sq > 100:
break
# Works with built-in functions that accept iterables
total = sum(x ** 2 for x in range(1000)) # no list created
maximum = max(len(line) for line in open("f.txt")) # no list created
has_admin = any(u.role == "admin" for u in users) # stops at first match
all_valid = all(is_valid(item) for item in items) # stops at first failure
# Single-argument functions: outer () can be omitted
total = sum(x ** 2 for x in range(10)) # not sum((x ** 2 for x in range(10)))
result = ",".join(str(x) for x in [1,2,3]) # "1,2,3"
# Filtering with condition
evens = (x for x in range(20) if x % 2 == 0)
even_squares = (x ** 2 for x in range(20) if x % 2 == 0)
() while list comprehensions use square brackets []. The resulting objects behave differently: a list comprehension evaluates everything immediately and stores it all in RAM; a generator expression creates a generator object that produces values lazily. When you pass a generator expression directly as the only argument to a function, Python allows you to drop the extra set of parentheses: sum(x for x in range(10)) is valid.sum(), max(), min(), any(), all(), ",".join() โ always prefer generator expressions over list comprehensions. The function iterates the input once and discards each item as it goes; building a full list first wastes memory proportional to the collection size. The ",".join(str(x) for x in items) pattern is especially important โ never use ",".join([str(x) for x in items]).items = list(generator). Also, generator expressions capture variables by reference, not by value โ the late-binding bug from Chapter 9 applies here too. Avoid complex generator expressions with closures that capture loop variables.Lazy Pipelines
import csv
from pathlib import Path
# โโ Lazy pipeline: read โ filter โ transform โ limit โโโโโโโโโโโโโโโโโโโโโโโโโ
# None of these generators actually reads or processes the file until iteration
# Stage 1: open file and yield lines (generator)
def read_lines(path: str):
with open(path, encoding="utf-8") as f:
yield from f
# Stage 2: parse CSV rows
def parse_csv(lines):
reader = csv.DictReader(lines)
yield from reader
# Stage 3: filter rows
def filter_active(rows):
for row in rows:
if row.get("active", "").lower() == "true":
yield row
# Stage 4: transform rows
def extract_emails(rows):
for row in rows:
yield row["email"].strip().lower()
# Compose the pipeline โ no I/O or computation yet
lines = read_lines("users.csv")
rows = parse_csv(lines)
active = filter_active(rows)
emails = extract_emails(active)
# ONLY NOW does the file get read โ one line at a time
first_10_emails = []
for email in emails:
first_10_emails.append(email)
if len(first_10_emails) == 10:
break
# File is closed by context manager when read_lines generator is GC'd
# โโ Equivalent with generator expressions โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
with open("users.csv", encoding="utf-8") as f:
active_emails = (
row["email"].strip().lower()
for row in csv.DictReader(f)
if row.get("active", "").lower() == "true"
)
first_10 = list(islice(active_emails, 10)) # from itertools
Memory Comparison
import sys
# โโ Measure memory difference โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
n = 1_000_000
# List approach โ all values in RAM
squares_list = [x ** 2 for x in range(n)]
print(f"List size: {sys.getsizeof(squares_list):,} bytes") # ~8,700,000 bytes
# Generator approach โ one value at a time
squares_gen = (x ** 2 for x in range(n))
print(f"Generator size: {sys.getsizeof(squares_gen)} bytes") # 112 bytes
# Both produce the same sum:
# sum(squares_list) == sum(squares_gen) == 333332833333500000
# โโ When to use list vs generator โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Use LIST when:
# - You need to iterate multiple times
# - You need random access by index (items[3])
# - You need len() or reversed()
# - The collection is small (under ~1000 items)
# - You need to check membership frequently
# Use GENERATOR when:
# - Processing large datasets (files, DB results)
# - You only iterate once
# - Memory is a constraint
# - You want to start processing before all data is available (streaming)
# - The sequence is infinite
Common Mistakes
Mistake 1 โ Building a list just to pass to sum/max/all
โ Wrong โ unnecessarily materialises the list:
total = sum([x ** 2 for x in large_range]) # list built, then summed, then discarded
โ Correct โ generator expression:
total = sum(x ** 2 for x in large_range) # โ no list, one-pass streaming
Mistake 2 โ Mixing lazy pipeline with length check
โ Wrong โ generators have no len():
gen = (x for x in range(10))
print(len(gen)) # TypeError: object of type 'generator' has no len()
โ Correct โ convert to list first if length is needed:
items = list(x for x in range(10))
print(len(items)) # 10 โ
Mistake 3 โ Nested generator expressions that are hard to debug
โ Wrong โ difficult to understand and debug:
result = list(v for row in (parse(l) for l in open("f") if l.strip()) for v in row if v)
โ Correct โ named generator functions for each step:
lines = (l for l in open("f") if l.strip())
rows = (parse(l) for l in lines)
values = (v for row in rows for v in row if v)
result = list(values) # โ readable, debuggable
Quick Reference
| Pattern | Code | Memory |
|---|---|---|
| List comprehension | [expr for x in it] |
O(n) โ all at once |
| Generator expression | (expr for x in it) |
O(1) โ one at a time |
| Sum without list | sum(expr for x in it) |
O(1) |
| Any/all early exit | any(cond for x in it) |
O(1) โ stops early |
| Lazy filter | (x for x in it if cond) |
O(1) |
| Take first N | islice(gen, N) |
O(N) |
| Chain generators | yield from other_gen |
O(1) |