Database schema migrations in production require extra care compared to development. Running Alembic migrations on application startup (alembic upgrade head inside the FastAPI lifespan) creates a race condition when multiple Uvicorn workers start simultaneously — all workers try to run migrations at the same time, and only one can hold the migration lock. More critically, a failed migration mid-way through leaves the database in a partially-migrated state while the application is already serving traffic. The solution is to run migrations as a separate step, before starting the application workers.
Docker Entrypoint Script
#!/bin/sh
# entrypoint.sh — runs before the application starts
set -e # exit immediately on any error
echo "Running database migrations..."
alembic upgrade head
if [ $? -eq 0 ]; then
echo "Migrations complete. Starting application..."
exec uvicorn app.main:app \
--host 0.0.0.0 \
--port 8000 \
--workers 2
else
echo "Migration failed! Aborting startup."
exit 1
fi
# In Dockerfile.backend, replace CMD with:
COPY --chown=app:app entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
CMD ["/app/entrypoint.sh"]
set -e flag in the shell script causes it to exit immediately if any command returns a non-zero exit code. Without it, a failed migration would be silently ignored and the application would start with an outdated schema, leading to confusing runtime errors. The exec uvicorn ... command replaces the shell process with the Uvicorn process — this ensures Docker properly receives signals (SIGTERM, SIGINT) sent to the container PID 1 and can gracefully shut down the application.migrate service that runs alembic upgrade head and exits, and make the backend service depends_on: [migrate] with condition: service_completed_successfully. This separates migration concerns from application concerns — the application container’s CMD stays as the pure Uvicorn command, and you can run migrations independently without starting the application.Zero-Downtime Migration Pattern
# WRONG: Adding NOT NULL without default on a live table (table lock!)
# op.add_column("posts", sa.Column("word_count", sa.Integer(), nullable=False))
# CORRECT: Three-step zero-downtime migration
# Step 1 (deploy with old code): add nullable column
def upgrade_step1():
op.add_column("posts",
sa.Column("word_count", sa.Integer(), nullable=True))
# Step 2 (between deploys): backfill existing rows
def upgrade_step2():
op.execute("""
UPDATE posts
SET word_count = array_length(regexp_split_to_array(trim(body), E'\\\\s+'), 1)
WHERE word_count IS NULL
""")
# Step 3 (deploy with new code): add NOT NULL constraint
def upgrade_step3():
op.alter_column("posts", "word_count", nullable=False)
Checking Migration Status
# Check current revision
docker exec blog-backend-1 alembic current
# Check pending migrations
docker exec blog-backend-1 alembic heads
# Show migration history
docker exec blog-backend-1 alembic history --verbose
# Downgrade one step (emergency rollback)
docker exec blog-backend-1 alembic downgrade -1
# Downgrade to specific revision
docker exec blog-backend-1 alembic downgrade abc123de
Common Mistakes
Mistake 1 — Running migrations in lifespan with multiple workers (race condition)
❌ Wrong — 4 Uvicorn workers all run alembic upgrade head simultaneously:
@asynccontextmanager
async def lifespan(app):
alembic.upgrade("head") # all 4 workers run this simultaneously!
yield
✅ Correct — run migrations in entrypoint before workers start.
Mistake 2 — Adding NOT NULL constraint without default on live table
❌ Wrong — locks the table, blocks all queries for minutes on large tables.
✅ Correct — use the three-step nullable → backfill → NOT NULL pattern.