🐍 Expert Python Interview Questions
This lesson targets senior engineers and architects. Topics include CPython internals, the import system, Python 3.10-3.13 features, performance profiling, Cython, the free-threaded GIL, __new__, structural pattern matching, custom iterators, garbage collection cycles, async context managers, exception groups, and production architecture patterns.
Questions & Answers
01 How does CPython execute Python code internally? ►
Internals CPython executes Python through three stages: source parsing into an AST, AST compilation into bytecode, and bytecode interpretation by a stack-based VM.
- Parsing โ lexer tokenises source; PEG parser (Python 3.9+) builds an Abstract Syntax Tree
- Compilation โ AST compiled to bytecode; cached in
__pycache__/*.pyc - Execution โ CPython’s eval loop interprets bytecode instructions one at a time
import dis, ast
def add(a, b): return a + b
dis.dis(add)
# LOAD_FAST 0 (a)
# LOAD_FAST 1 (b)
# BINARY_OP 0 (+)
# RETURN_VALUE
tree = ast.parse("x = 1 + 2")
print(ast.dump(tree, indent=2))
# Python 3.12 specialising adaptive interpreter:
# Hot bytecodes replaced with type-specialised versions (CALL_PY_EXACT_ARGS etc.)
# Python 3.13 -- experimental JIT compiler (python3.13 --enable-experimental-jit)
02 What is Python’s import system? How do finders and loaders work? ►
Import System
import sys
# When Python sees `import mymodule`:
# 1. Check sys.modules (cache) -- return immediately if found
# 2. Iterate sys.meta_path finders -- ask each to find the module
# 3. Winning finder returns a ModuleSpec + Loader
# 4. Loader executes module code, populates sys.modules
print(sys.meta_path)
# [BuiltinImporter, FrozenImporter, PathFinder]
# Custom import audit hook
class AuditFinder:
def find_spec(self, fullname, path, target=None):
print(f"Importing: {fullname}")
return None # let next finder try
sys.meta_path.insert(0, AuditFinder())
# Custom loader -- import from a database or network
import importlib.abc, importlib.util
class DBLoader(importlib.abc.Loader):
def __init__(self, source): self.source = source
def exec_module(self, module):
exec(compile(self.source, "<db>", "exec"), module.__dict__)
# Import hooks used by: pytest, coverage.py, import guards, lazy loaders
03 What are the key new features in Python 3.10, 3.11, 3.12, and 3.13? ►
New Features
3.10: Structural pattern matching (match/case), X | Y union syntax, better error messages, parenthesised context managers
3.11: 10-60% performance improvement, tomllib, asyncio.TaskGroup, asyncio.timeout(), exception groups and except*
3.12: Type parameter syntax def func[T](...) (PEP 695), inlined comprehensions, sys.monitoring, @override decorator, f-string improvements
3.13: Free-threaded mode (python3.13t, no GIL), experimental JIT, new interactive REPL, copy.replace()
# 3.10 -- structural pattern matching
def handle(event):
match event:
case {"type": "click", "x": x, "y": y}:
return f"Click at ({x}, {y})"
case {"type": "keypress", "key": key}:
return f"Key: {key}"
case {"type": str(t)}:
return f"Unknown: {t}"
case _:
return "Not an event"
# 3.11 -- asyncio.TaskGroup
async def main():
async with asyncio.TaskGroup() as tg:
t1 = tg.create_task(might_fail("task1"))
t2 = tg.create_task(might_fail("task2"))
# All exceptions collected in ExceptionGroup if any fail
04 How do you profile Python code and find performance bottlenecks? ►
Performance
import timeit, cProfile, pstats, io, tracemalloc
# 1. timeit -- micro-benchmarks
timeit.timeit("'-'.join(str(n) for n in range(100))", number=10000)
# 2. cProfile -- deterministic profiler
pr = cProfile.Profile()
pr.enable()
# ... code to profile ...
pr.disable()
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats(pstats.SortKey.CUMULATIVE)
ps.print_stats(20)
# Run from command line:
# python -m cProfile -s cumulative myscript.py
# 3. line_profiler -- per-line timing (pip install line_profiler)
# kernprof -l -v myscript.py
# 4. memory_profiler -- per-line memory (pip install memory_profiler)
# python -m memory_profiler myscript.py
# 5. py-spy -- sampling profiler, zero overhead, attaches to running process
# py-spy top --pid 12345
# py-spy record -o profile.svg --pid 12345 # flame graph
# 6. tracemalloc -- built-in memory tracing
tracemalloc.start()
# ... run code ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics("lineno")[:10]:
print(stat)
05 How do you write Python C extensions? When should you use Cython or ctypes? ►
Performance
- Cython โ Python-like syntax compiled to C. Add type annotations for speed. Best productivity/performance balance. Used by NumPy, pandas, Scipy.
- ctypes โ call functions in existing .so/.dll without writing C. Good for wrapping C libraries.
- cffi โ modern, Pythonic alternative to ctypes.
- pybind11 โ C++ bindings, header-only, modern C++11.
# Cython example -- compute.pyx
# cython: language_level=3
def compute_sum(int n): # typed parameter
cdef long total = 0 # C stack variable
cdef int i
for i in range(n): total += i
return total
# setup.py
from setuptools import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize("compute.pyx",
compiler_directives={"boundscheck": False}))
# python setup.py build_ext --inplace
# ctypes -- call existing C library
import ctypes
lib = ctypes.CDLL("./libmath.so")
lib.add.argtypes = [ctypes.c_int, ctypes.c_int]
lib.add.restype = ctypes.c_int
result = lib.add(3, 4) # calls C function directly
# When to extend C: hot numerical loops, wrapping C/C++ libraries
# First: rewrite algorithm (O(n^2) to O(n log n)) -- often more impactful
06 How does Python 3.13 free-threaded mode (PEP 703) work? ►
GIL / Threading PEP 703 makes the GIL optional. The free-threaded build (python3.13t) enables true CPU parallelism across threads.
# Check if GIL is enabled
import sys
sys._is_gil_enabled() # False in free-threaded build
# Free-threaded -- CPU work truly parallelises
import threading
def compute(n, results, idx):
results[idx] = sum(i**2 for i in range(n))
results = [0] * 4
threads = [threading.Thread(target=compute, args=(10_000_000, results, i))
for i in range(4)]
for t in threads: t.start()
for t in threads: t.join()
# Python 3.12 (GIL): sequential -- one thread at a time
# Python 3.13t (no GIL): ~4x faster -- all threads truly parallel
# How it replaces the GIL:
# - Immortalisation: True/False/None/small ints never change refcount
# - Per-object locking: list, dict use fine-grained locks
# - Biased reference counting: local refcount per owning thread
# Status: experimental in 3.13, maturing in 3.14+
# ~10% per-thread overhead vs GIL Python (extra locking)
07 What are Python’s __new__ vs __init__? How do they work together? ►
OOP Internals __new__ allocates and returns the new instance (the real constructor). __init__ receives the already-created instance and initialises it.
# Singleton using __new__
class Singleton:
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
Singleton() is Singleton() # True
# Custom __new__ for immutable types (can't use __init__)
class NonZeroInt(int):
def __new__(cls, value):
if value == 0: raise ValueError("Zero not allowed")
return super().__new__(cls, value)
NonZeroInt(5) # 5
NonZeroInt(0) # ValueError
# If __new__ returns a DIFFERENT class, __init__ is NOT called
class Factory:
def __new__(cls, make_string=False):
if make_string:
return "I'm a string!" # __init__ NOT called
return super().__new__(cls) # __init__ IS called
type(Factory(make_string=True)) # str
type(Factory(make_string=False)) # Factory
08 What is structural pattern matching (Python 3.10)? ►
Python 3.10+
# Literal patterns
def classify(point):
match point:
case (0, 0): return "Origin"
case (x, 0): return f"X-axis at {x}"
case (0, y): return f"Y-axis at {y}"
case (x, y): return f"Point at ({x}, {y})"
case _: return "Not a point"
# Mapping patterns
def handle_event(event):
match event:
case {"action": "buy", "item": item, "qty": int(qty)}:
return f"Buying {qty} x {item}"
case {"action": "sell", **rest}:
return f"Selling: {rest}"
# Sequence patterns with *rest
def parse_cmd(cmd):
match cmd.split():
case ["quit"]: return "quit"
case ["go", direction]: return f"go {direction}"
case ["get", obj, *rest]: return f"get {obj}"
case _: return "unknown"
# Class patterns with __match_args__
class Point:
__match_args__ = ("x", "y")
def __init__(self, x, y): self.x, self.y = x, y
match Point(3, 4):
case Point(x, y): print(f"at {x}, {y}") # positional match
09 How do you implement a custom iterator and the iterable protocol? ►
Data Model Iterable: has __iter__ returning an iterator. Iterator: has __iter__ (returns self) and __next__ (raises StopIteration when done).
class Fibonacci:
"""Infinite Fibonacci -- its own iterator."""
def __init__(self, limit=None):
self.a, self.b, self.count, self.limit = 0, 1, 0, limit
def __iter__(self): return self
def __next__(self):
if self.limit is not None and self.count >= self.limit:
raise StopIteration
result = self.a
self.a, self.b = self.b, self.a + self.b
self.count += 1
return result
list(Fibonacci(10)) # [0,1,1,2,3,5,8,13,21,34]
# Reusable iterable -- creates a fresh iterator each time
class Range:
def __init__(self, start, stop, step=1):
self.start, self.stop, self.step = start, stop, step
def __iter__(self):
cur = self.start
while cur < self.stop:
yield cur
cur += self.step
r = Range(0, 10, 2)
list(r) # [0,2,4,6,8]
list(r) # [0,2,4,6,8] -- reusable, new iterator each time
10 How does Python’s cyclic garbage collector work? ►
Memory
import gc, weakref
# Reference cycles -- refcounting cannot free these
class Node:
def __init__(self, data): self.data = data; self.other = None
a = Node("A"); b = Node("B")
a.other = b; b.other = a # a and b reference each other
del a, b # refcounts drop to 1 (not 0) -- leak without cyclic GC
gc.collect() # cyclic GC detects and frees the cycle
# Generational GC -- three generations (young to old)
gc.get_threshold() # (700, 10, 10) default thresholds
# Gen 0: new objects, collected most often
# Gen 1: survived one Gen 0 collection
# Gen 2: long-lived objects, collected least often
# WeakValueDictionary -- track objects without preventing GC
import weakref
cache = weakref.WeakValueDictionary()
obj = ExpensiveObject()
cache["key"] = obj # won't prevent obj from being GC'd
del obj
cache["key"] # KeyError -- obj was collected
# weakref.finalize -- cleaner than __del__
def cleanup(resource): resource.close()
ref = weakref.finalize(obj, cleanup, obj._resource)
11 What are async context managers and async iterators? ►
Async
from contextlib import asynccontextmanager
# Async context manager -- __aenter__ / __aexit__
class AsyncDB:
async def __aenter__(self):
self.conn = await open_connection()
return self.conn
async def __aexit__(self, *args):
await self.conn.close()
return False
async def use():
async with AsyncDB() as conn:
await conn.execute("SELECT 1")
# @asynccontextmanager -- simpler
@asynccontextmanager
async def db_transaction(conn):
async with conn.transaction():
try: yield conn
except: await conn.rollback(); raise
# Async iterator -- __aiter__ / __anext__
class AsyncCounter:
def __init__(self, limit): self.limit=limit; self.current=0
def __aiter__(self): return self
async def __anext__(self):
if self.current >= self.limit: raise StopAsyncIteration
await asyncio.sleep(0)
self.current += 1
return self.current
# Async generator
async def async_range(n):
for i in range(n):
await asyncio.sleep(0)
yield i
async def use():
result = [i async for i in async_range(5)] # async list comprehension
12 What are exception groups and except* syntax (Python 3.11+)? ►
Python 3.11+ Exception Groups (PEP 654) allow multiple unrelated exceptions to be raised simultaneously โ essential for asyncio.TaskGroup.
import asyncio
# except* -- handle specific exceptions from a group
try:
raise ExceptionGroup("errors", [
ValueError("bad value"),
TypeError("wrong type"),
RuntimeError("runtime problem")
])
except* ValueError as eg:
print(f"Handled {len(eg.exceptions)} ValueError(s)")
except* (TypeError, RuntimeError) as eg:
print(f"Handled {len(eg.exceptions)} other errors")
# asyncio.TaskGroup -- structured concurrency (Python 3.11+)
async def main():
try:
async with asyncio.TaskGroup() as tg:
t1 = tg.create_task(might_fail("task1"))
t2 = tg.create_task(might_fail("task2"))
t3 = tg.create_task(succeed())
# If t1 and t2 both fail, both exceptions are collected
# t3's success is preserved
except* ValueError as eg:
for exc in eg.exceptions:
print(f"Task failed: {exc}")
# Advantages over asyncio.gather(return_exceptions=True):
# - Cleaner syntax
# - Remaining tasks are cancelled on first failure
# - All exceptions collected, not just one
13 What is the PEP 695 generic syntax introduced in Python 3.12? ►
Python 3.12+ PEP 695 introduces a cleaner, built-in syntax for generic functions, classes, and type aliases โ replacing the verbose TypeVar boilerplate.
# Python 3.11 and earlier -- verbose
from typing import TypeVar, Generic
T = TypeVar("T")
def first_old(items: list[T]) -> T: return items[0]
class Stack_old(Generic[T]):
def push(self, item: T) -> None: ...
# Python 3.12 -- new concise syntax (PEP 695)
def first[T](items: list[T]) -> T: # T is a type parameter
return items[0]
class Stack[T]: # generic class
def __init__(self): self._items: list[T] = []
def push(self, item: T): self._items.append(item)
def pop(self) -> T: return self._items.pop()
# Bounds
def add_numbers[T: (int, float)](a: T, b: T) -> T: return a + b
# Type aliases with 'type' statement
type Vector = list[float]
type Matrix = list[Vector]
type Callback[T] = (T) -> None # generic type alias
14 How do you architect a large Python application for maintainability? ►
Architecture
my_project/
src/
my_project/
__init__.py
main.py # entry point, wiring
api/ # HTTP layer -- thin, delegates to services
v1/
routes/
schemas/ # Pydantic request/response models
core/
config.py # pydantic-settings Settings
dependencies.py # DI factories
exceptions.py # custom exception hierarchy
domain/ # pure Python -- no framework dependencies
models/ # dataclasses or domain entities
services/ # business logic
interfaces/ # ABCs (ports)
infrastructure/ # implement ABCs (adapters)
repositories/
cache/
email/
workers/ # background tasks (Celery/ARQ)
tests/
unit/ # pure functions, no I/O
integration/
pyproject.toml
# Key principles:
# 1. Dependency inversion: domain imports ABCs, not infrastructure
# 2. Type hints everywhere: enables mypy strict analysis
# 3. No global mutable state: wire through DI
# 4. Explicit over implicit
from dataclasses import dataclass
@dataclass
class Container:
db: Database; cache: Cache; email: EmailService
def create_container(s: Settings) -> Container:
return Container(PostgreSQLDB(s.db_url), RedisCache(s.redis_url), SendGridEmail(s.key))
15 What is sys.monitoring (Python 3.12) and why is it better than sys.settrace? ►
Python 3.12+ sys.monitoring (PEP 669) provides low-overhead code instrumentation โ only pays for the events you opt into, unlike sys.settrace which fires on every bytecode line.
import sys
TOOL_ID = 3
sys.monitoring.use_tool_id(TOOL_ID, "my_profiler")
call_count = {}
def call_handler(code, offset):
name = f"{code.co_filename}:{code.co_name}"
call_count[name] = call_count.get(name, 0) + 1
return sys.monitoring.DISABLE # trampoline -- disable per code object
# Enable only CALL events (much cheaper than tracing every line)
sys.monitoring.set_events(TOOL_ID, sys.monitoring.events.CALL)
sys.monitoring.register_callback(TOOL_ID, sys.monitoring.events.CALL, call_handler)
def run_code():
for i in range(1000): [x**2 for x in range(10)]
run_code()
sys.monitoring.free_tool_id(TOOL_ID)
print(call_count)
# Why better than sys.settrace:
# 1. Only pay for events you need (CALL, LINE, RETURN, RAISE, etc.)
# 2. Trampoline: disable per-code-object after first hit -- ~10x less overhead
# 3. Multiple tools coexist (coverage + profiler simultaneously)
# 4. Used by coverage.py 7.2+, Python's built-in coverage module
16 What are contextlib utilities beyond @contextmanager? ►
Standard Library
from contextlib import (suppress, redirect_stdout, ExitStack,
AsyncExitStack, nullcontext, closing, chdir)
import io
# suppress -- silently ignore specific exceptions
with suppress(FileNotFoundError, PermissionError):
os.remove("might_not_exist.tmp")
# redirect_stdout -- capture print output
out = io.StringIO()
with redirect_stdout(out):
print("This goes to StringIO")
captured = out.getvalue()
# ExitStack -- manage a dynamic number of context managers
def process_files(filenames):
with ExitStack() as stack:
files = [stack.enter_context(open(f)) for f in filenames]
return [f.read() for f in files] # all closed on exit
# nullcontext -- placeholder when no CM is needed
def process(data, ctx=None):
with (ctx if ctx else nullcontext()):
return transform(data)
# chdir -- temporary directory change (Python 3.11+)
with chdir("/tmp"):
subprocess.run(["./script.sh"])
# original directory restored
# closing -- call .close() on objects without __exit__
with closing(urllib.request.urlopen(url)) as resp:
data = resp.read()
17 What is Python’s operator overloading? When is it appropriate? ►
Data Model
from functools import total_ordering
@total_ordering # only need __eq__ + ONE comparison method
class Money:
def __init__(self, amount: float, currency: str = "GBP"):
self.amount = round(amount, 2)
self.currency = currency
def __repr__(self): return f"Money({self.amount!r}, {self.currency!r})"
def __str__(self): return f"{self.currency} {self.amount:.2f}"
def __add__(self, other):
if isinstance(other, Money):
if self.currency != other.currency:
raise TypeError(f"Cannot add {self.currency} + {other.currency}")
return Money(self.amount + other.amount, self.currency)
return NotImplemented # let Python try the reverse
def __radd__(self, other): # for sum([Money(1), Money(2)])
if other == 0: return self
return NotImplemented
def __mul__(self, scalar):
if isinstance(scalar, (int, float)): return Money(self.amount*scalar, self.currency)
return NotImplemented
__rmul__ = __mul__
def __eq__(self, other):
if isinstance(other, Money): return self.amount==other.amount and self.currency==other.currency
return NotImplemented
def __lt__(self, other):
if isinstance(other, Money) and self.currency==other.currency: return self.amount<other.amount
return NotImplemented
prices = [Money(30), Money(10), Money(20)]
total = sum(prices, start=Money(0)) # uses __radd__
max(prices) # uses __lt__
18 What are common Python performance anti-patterns? ►
Performance
# 1. String concatenation in a loop -- O(n^2) # BAD result = "" for word in words: result += word + " " # GOOD result = " ".join(words) # 2. List for membership testing -- O(n) # BAD if item in some_list: ... # O(n) # GOOD if item in some_set: ... # O(1) # 3. Re-compiling regex in a loop # BAD for item in items: re.findall(r"\d+", item) # recompiled every time # GOOD pattern = re.compile(r"\d+") for item in items: pattern.findall(item) # 4. Unnecessary list materialisation total = sum(list(range(1_000_000))) # BAD: creates list total = sum(range(1_000_000)) # GOOD: lazy range # 5. Repeated attribute lookups in tight loops for i in range(1_000_000): mylist.append(i) # attribute lookup each iter _append = mylist.append for i in range(1_000_000): _append(i) # ~20% faster # 6. Python loops for numerical arrays arr = [1.0, 2.0, 3.0, ..., 1_000_000.0] total = sum(x**2 for x in arr) # ~500ms (Python loop) import numpy as np nparr = np.array(arr) total = (nparr**2).sum() # ~1ms (vectorised C)
19 What is __class_getitem__ and how does it enable generic aliases? ►
Data Model __class_getitem__ is called when you subscript a class โ e.g., list[int], dict[str, int]. It returns a generic alias used by the type system.
# Built-in generics (Python 3.9+)
x: list[int] = [1, 2, 3] # list.__class_getitem__(int)
y: dict[str, int] = {"a": 1}
list[int].__origin__ # list
list[int].__args__ # (int,)
# Runtime-validated generic
class TypedList:
def __init__(self, item_type):
self._type = item_type
self._data = []
def __class_getitem__(cls, item_type):
return cls(item_type) # returns an INSTANCE, not just a type alias
def append(self, item):
if not isinstance(item, self._type):
raise TypeError(f"Expected {self._type.__name__}, got {type(item).__name__}")
self._data.append(item)
int_list = TypedList[int] # creates TypedList(int)
int_list.append(5) # OK
int_list.append("hi") # TypeError
20 How do you build and distribute a Python package with pyproject.toml? ►
Packaging
# pyproject.toml -- modern single-file configuration (PEP 517/518/621)
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "my-library"
version = "2.1.0"
description = "A helpful library"
readme = "README.md"
requires-python = ">=3.10"
authors = [{name="Alice", email="alice@example.com"}]
dependencies = ["httpx>=0.27", "pydantic>=2.0"]
[project.optional-dependencies]
dev = ["pytest", "ruff", "mypy"]
# CLI entry point
[project.scripts]
my-cli = "my_package.cli:main"
# Tool config in the same file
[tool.pytest.ini_options]
testpaths = ["tests"]
[tool.ruff]
line-length = 88
select = ["E","F","I","N","UP"]
[tool.mypy]
strict = true
# Build and publish
# pip install build twine
# python -m build # creates dist/*.whl and dist/*.tar.gz
# twine upload dist/* # upload to PyPI
21 What are Python’s struct and array modules? When do you use them? ►
Low Level
import struct, array
# struct -- pack/unpack C binary data
# Format: '<' little-endian, '>' big-endian
# 'i'=int32, 'I'=uint32, 'f'=float32, 'd'=float64, 'H'=uint16, 'B'=uint8
# Pack
data = struct.pack("<IHH", 0x12345678, 0xABCD, 0x0001)
# Unpack
value, code, flags = struct.unpack("<IHH", data)
# Reusable struct object (faster for repeated ops)
header = struct.Struct("<IHH")
buf = header.pack(1234, 5678, 1)
# Parse binary file format
with open("data.bin", "rb") as f:
while True:
hdr = f.read(8)
if not hdr: break
msg_type, length = struct.unpack(">IH", hdr[:6])
payload = f.read(length)
# array -- typed, contiguous arrays (between list and NumPy)
arr = array.array("d", [1.0, 2.0, 3.0]) # 'd' = C double
arr.append(4.0)
arr.tobytes() # raw bytes
arr.tofile(open("floats.bin","wb"))
# When to use:
# struct: network protocol parsing, binary file formats
# array: typed homogeneous sequences (faster than list)
# NumPy: mathematics, vectorised operations (most common choice)