Connection Pooling, VACUUM and Database Maintenance

📚 React + FastAPI 📂 Chapter 18: Indexes, EXPLAIN and Query Performance 📄 Lesson 18050 Intermediate 🕒 March 17, 2026

A high-performance FastAPI application needs more than good SQL and indexes — it needs proper connection management, regular database maintenance, and observability into what is slow. PostgreSQL connections are expensive: each one spawns an OS process consuming ~5–10MB RAM. A FastAPI application under load without connection pooling will either exhaust the connection limit or slow down drastically as connection establishment becomes a bottleneck. Connection pooling, PostgreSQL’s VACUUM process, and the slow query log are the three operational pillars of production database performance.

Connection Pooling with SQLAlchemy

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# SQLAlchemy's built-in connection pool
engine = create_engine(
    "postgresql://user:pass@localhost/mydb",

    # Pool configuration
    pool_size     = 10,    # max persistent connections (default: 5)
    max_overflow  = 20,    # extra connections above pool_size when busy
    pool_timeout  = 30,    # seconds to wait for a connection before error
    pool_recycle  = 1800,  # recycle connections after 30 minutes (avoids stale conns)
    pool_pre_ping = True,  # test connection health before use (reconnects if dropped)

    # Echo for debugging (log all SQL to stdout)
    echo = False,   # set True in development to see generated SQL
)

SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)

# FastAPI dependency
def get_db():
    db = SessionLocal()
    try:
        yield db
        db.commit()
    except Exception:
        db.rollback()
        raise
    finally:
        db.close()   # returns connection to pool, does not close it

Note: SQLAlchemy’s pool does not actually close connections when you call session.close() — it returns them to the pool for reuse. The pool_size setting controls how many connections PostgreSQL will have open at all times. For a FastAPI application with 4 worker processes and pool_size=10, you need PostgreSQL’s max_connections to be at least 40 (4 × 10). Add headroom for monitoring and admin connections: configure PostgreSQL’s max_connections = 200 for typical deployments.

Tip: For high-concurrency FastAPI deployments, use PgBouncer as a dedicated connection pooler between your application and PostgreSQL. PgBouncer handles thousands of application connections with a small number of actual PostgreSQL connections (transaction-level pooling). The difference: SQLAlchemy’s pool is per-process (4 workers × 10 connections = 40 PostgreSQL connections); PgBouncer is shared across all processes (one pool for the whole application, typically 10–50 PostgreSQL connections total).

Warning: Never use NullPool in production FastAPI applications — it creates and destroys a database connection for every request, making every request pay the ~5ms connection establishment overhead. NullPool is appropriate only in testing environments where connection state isolation is needed. The opposite mistake — too large a pool size — wastes PostgreSQL’s connection resources and can exhaust max_connections.

VACUUM and AUTOVACUUM

-- PostgreSQL's MVCC (multi-version concurrency) creates dead tuples on every UPDATE/DELETE
-- Dead tuples accumulate until VACUUM reclaims them
-- Unvacuumed tables consume growing amounts of disk and slow queries

-- ── Check if tables need vacuuming ───────────────────────────────────────────
SELECT
    relname         AS table_name,
    n_live_tup      AS live_rows,
    n_dead_tup      AS dead_rows,
    ROUND(n_dead_tup::NUMERIC / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1)
                    AS dead_percent,
    last_vacuum,
    last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_percent DESC;

-- ── Manual VACUUM ─────────────────────────────────────────────────────────────
VACUUM posts;                -- reclaim dead tuples (non-locking)
VACUUM ANALYZE posts;        -- reclaim + update statistics
VACUUM FULL posts;           -- full rewrite (exclusive lock!) — avoid in production

-- ── AUTOVACUUM configuration (postgresql.conf) ───────────────────────────────
-- autovacuum = on                          # enable (default)
-- autovacuum_vacuum_threshold = 50         # trigger after 50 dead tuples
-- autovacuum_vacuum_scale_factor = 0.2     # + 20% of table size
-- autovacuum_analyze_threshold = 50        # trigger ANALYZE after 50 new rows
-- autovacuum_analyze_scale_factor = 0.1    # + 10% of table size

-- For high-churn tables, tune per-table autovacuum:
ALTER TABLE posts SET (
    autovacuum_vacuum_scale_factor = 0.05,   -- vacuum at 5% dead (more frequent)
    autovacuum_analyze_scale_factor = 0.05   -- analyze at 5% new rows
);

Slow Query Log

-- ── Enable slow query logging (postgresql.conf) ──────────────────────────────
-- log_min_duration_statement = 100   # log queries taking >100ms
-- log_line_prefix = '%t [%p] %u@%d '  # timestamp, pid, user, database
-- log_destination = 'csvlog'          # log to CSV for easy parsing

-- ── View current setting ──────────────────────────────────────────────────────
SHOW log_min_duration_statement;

-- ── Enable for current session (testing) ─────────────────────────────────────
SET log_min_duration_statement = 0;   -- log ALL queries in this session
-- (resets when session ends)

-- ── pg_stat_statements extension (install once) ──────────────────────────────
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- View top 10 slowest queries by total execution time
SELECT
    LEFT(query, 100)     AS query_preview,
    calls,
    ROUND(total_exec_time::NUMERIC / calls, 2) AS avg_ms,
    ROUND(total_exec_time::NUMERIC, 2)          AS total_ms,
    rows / NULLIF(calls, 0)                     AS avg_rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Reset statistics (start fresh after schema changes)
SELECT pg_stat_statements_reset();

Monitoring Queries

-- ── Active queries ────────────────────────────────────────────────────────────
SELECT
    pid,
    now() - query_start AS duration,
    state,
    LEFT(query, 100) AS query_preview
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start IS NOT NULL
ORDER BY duration DESC;

-- ── Long-running queries ──────────────────────────────────────────────────────
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
  AND now() - query_start > INTERVAL '5 seconds'
ORDER BY duration DESC;

-- Kill a stuck query (without terminating the connection)
SELECT pg_cancel_backend(pid);

-- Kill a stuck connection entirely (more forceful)
SELECT pg_terminate_backend(pid);

Common Mistakes

Mistake 1 — No connection pooling (creating connection per request)

❌ Wrong — new PostgreSQL connection for every request:

@app.get("/posts")
async def get_posts():
    conn = psycopg2.connect(DATABASE_URL)   # ~5ms overhead per request!
    # ... query ...
    conn.close()   # waste — immediately destroyed

✅ Correct — reuse connections from a pool:

engine = create_engine(DATABASE_URL, pool_size=10)
# Sessions from SessionLocal() reuse pooled connections ✓

Mistake 2 — Setting pool_size too high

❌ Wrong — pool_size=100 on a server with max_connections=100:

engine = create_engine(url, pool_size=100, max_overflow=50)
# With 4 worker processes: 4 × 100 = 400 connections > max_connections!
# New connections rejected: "FATAL: sorry, too many clients already"

✅ Correct — size pool_size based on max_connections:

# max_connections=200, reserve 20 for admin: 180 / 4 workers = 45 each
engine = create_engine(url, pool_size=40, max_overflow=5)   # ✓

Mistake 3 — Disabling autovacuum on high-churn tables

❌ Wrong — manual vacuum only, table bloats between runs:

ALTER TABLE sessions SET (autovacuum_enabled = false);
-- Sessions table grows without bound, queries slow down

✅ Correct — tune autovacuum aggressiveness instead of disabling it:

ALTER TABLE sessions SET (autovacuum_vacuum_scale_factor = 0.01);   -- vacuum at 1% dead ✓

Quick Reference

Task	Tool / SQL
Configure connection pool	`create_engine(..., pool_size=N, max_overflow=M)`
Reclaim dead tuples	`VACUUM ANALYZE tablename`
Find tables needing vacuum	`SELECT ... FROM pg_stat_user_tables WHERE n_dead_tup > 1000`
Log slow queries	`log_min_duration_statement = 100`
Find slowest queries	`SELECT ... FROM pg_stat_statements ORDER BY total_exec_time DESC`
View active queries	`SELECT ... FROM pg_stat_activity WHERE state = 'active'`
Kill stuck query	`SELECT pg_cancel_backend(pid)`
Monitor index usage	`SELECT ... FROM pg_stat_user_indexes`

Connection Pooling with SQLAlchemy #

VACUUM and AUTOVACUUM #

Slow Query Log #

Monitoring Queries #

Common Mistakes #

Mistake 1 — No connection pooling (creating connection per request) #

Mistake 2 — Setting pool_size too high #

Mistake 3 — Disabling autovacuum on high-churn tables #

Quick Reference #

🧠 Test Yourself #

📚 More in this Tutorial Series

Connection Pooling with SQLAlchemy

VACUUM and AUTOVACUUM

Slow Query Log

Monitoring Queries

Common Mistakes

Mistake 1 — No connection pooling (creating connection per request)

Mistake 2 — Setting pool_size too high

Mistake 3 — Disabling autovacuum on high-churn tables

Quick Reference

🧠 Test Yourself