Database Migrations in Production — Alembic on Deploy

Database schema migrations in production require extra care compared to development. Running Alembic migrations on application startup (alembic upgrade head inside the FastAPI lifespan) creates a race condition when multiple Uvicorn workers start simultaneously — all workers try to run migrations at the same time, and only one can hold the migration lock. More critically, a failed migration mid-way through leaves the database in a partially-migrated state while the application is already serving traffic. The solution is to run migrations as a separate step, before starting the application workers.

Docker Entrypoint Script

#!/bin/sh
# entrypoint.sh — runs before the application starts
set -e   # exit immediately on any error

echo "Running database migrations..."
alembic upgrade head

if [ $? -eq 0 ]; then
    echo "Migrations complete. Starting application..."
    exec uvicorn app.main:app \
        --host 0.0.0.0 \
        --port 8000 \
        --workers 2
else
    echo "Migration failed! Aborting startup."
    exit 1
fi
# In Dockerfile.backend, replace CMD with:
COPY --chown=app:app entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
CMD ["/app/entrypoint.sh"]
Note: The set -e flag in the shell script causes it to exit immediately if any command returns a non-zero exit code. Without it, a failed migration would be silently ignored and the application would start with an outdated schema, leading to confusing runtime errors. The exec uvicorn ... command replaces the shell process with the Uvicorn process — this ensures Docker properly receives signals (SIGTERM, SIGINT) sent to the container PID 1 and can gracefully shut down the application.
Tip: Run migrations in a separate “init” container in docker-compose rather than in the application entrypoint. Define a migrate service that runs alembic upgrade head and exits, and make the backend service depends_on: [migrate] with condition: service_completed_successfully. This separates migration concerns from application concerns — the application container’s CMD stays as the pure Uvicorn command, and you can run migrations independently without starting the application.
Warning: Zero-downtime migrations require that new schema changes are backwards-compatible with the old code running during deployment. The deployment flow is: (1) add the new nullable column (old code ignores it, new code uses it), (2) deploy new application code, (3) backfill the column with default data if needed, (4) add NOT NULL constraint in a separate migration after all instances have updated. Never add a NOT NULL constraint without a DEFAULT in a single migration on a live table — it locks the table and blocks all queries.

Zero-Downtime Migration Pattern

# WRONG: Adding NOT NULL without default on a live table (table lock!)
# op.add_column("posts", sa.Column("word_count", sa.Integer(), nullable=False))

# CORRECT: Three-step zero-downtime migration

# Step 1 (deploy with old code): add nullable column
def upgrade_step1():
    op.add_column("posts",
        sa.Column("word_count", sa.Integer(), nullable=True))

# Step 2 (between deploys): backfill existing rows
def upgrade_step2():
    op.execute("""
        UPDATE posts
        SET word_count = array_length(regexp_split_to_array(trim(body), E'\\\\s+'), 1)
        WHERE word_count IS NULL
    """)

# Step 3 (deploy with new code): add NOT NULL constraint
def upgrade_step3():
    op.alter_column("posts", "word_count", nullable=False)

Checking Migration Status

# Check current revision
docker exec blog-backend-1 alembic current

# Check pending migrations
docker exec blog-backend-1 alembic heads

# Show migration history
docker exec blog-backend-1 alembic history --verbose

# Downgrade one step (emergency rollback)
docker exec blog-backend-1 alembic downgrade -1

# Downgrade to specific revision
docker exec blog-backend-1 alembic downgrade abc123de

Common Mistakes

Mistake 1 — Running migrations in lifespan with multiple workers (race condition)

❌ Wrong — 4 Uvicorn workers all run alembic upgrade head simultaneously:

@asynccontextmanager
async def lifespan(app):
    alembic.upgrade("head")   # all 4 workers run this simultaneously!
    yield

✅ Correct — run migrations in entrypoint before workers start.

Mistake 2 — Adding NOT NULL constraint without default on live table

❌ Wrong — locks the table, blocks all queries for minutes on large tables.

✅ Correct — use the three-step nullable → backfill → NOT NULL pattern.

🧠 Test Yourself

You need to rename the posts.body column to posts.content in a zero-downtime deployment. What is the problem with a direct column rename and how do you handle it?