Application Lifecycle — Health Probes, Startup Sequence, and Zero-Downtime Deployment

A Node.js Express application deployed in production needs more than just node server.js. It needs environment configuration management that prevents secrets from leaking, a structured application lifecycle with clear startup and shutdown phases, zero-downtime deployment patterns that eliminate maintenance windows, and health and readiness endpoints that integrate with load balancers and orchestration platforms. This lesson builds the complete production application architecture for the MEAN Stack task manager API.

Deployment Patterns

Pattern Downtime Complexity Rollback
Restart (naive) Brief Low Manual
PM2 graceful reload Zero Low pm2 revert
Blue-Green (Docker) Zero Medium Switch load balancer back
Rolling (Kubernetes) Zero High Automatic on health check failure

Health Check Types

Endpoint Returns Used By Checks
/health/live 200 if process is running Kubernetes liveness probe Process alive — if fails, restart container
/health/ready 200 if ready for traffic Kubernetes readiness probe + load balancer DB connected, cache connected, not shutting down
/health 200 with full status JSON Monitoring dashboards, humans All dependencies with status details
Note: Kubernetes uses two types of health checks: liveness (is the container still alive — if not, restart it) and readiness (is the container ready to serve traffic — if not, stop routing to it). These should check different things. The liveness probe should only fail for truly unrecoverable states (infinite loop, deadlock, process frozen). The readiness probe fails when the app cannot serve requests (DB disconnected, cache unavailable, still starting up). Failing the liveness probe is expensive (container restart with startup time); failing readiness just removes the pod from rotation.
Tip: Use dotenv-flow instead of plain dotenv for environment configuration. dotenv-flow loads files in order: .env (base), .env.local (local overrides, gitignored), .env.production (production defaults, committed), .env.production.local (production local overrides, gitignored). Variables in later files override earlier ones. This layered approach allows committed base configuration with local and environment-specific overrides without complex CI configuration.
Warning: The /health/ready endpoint must return 503 during startup (before dependencies are connected) and during shutdown (after server.close() is called). If it returns 200 during startup, the load balancer routes traffic before the app is ready — requests fail. If it keeps returning 200 during shutdown, the load balancer keeps sending traffic to a draining server — responses are dropped as connections close. The isReady flag is the most critical state in the application lifecycle.

Complete Production Application Architecture

// src/config/env.js — Environment validation and access
const path = require('path');

// Load .env files in order (dotenv-flow pattern)
['', '.local', `.${process.env.NODE_ENV}`, `.${process.env.NODE_ENV}.local`]
    .filter(Boolean)
    .forEach(suffix => {
        const envFile = path.resolve(process.cwd(), `.env${suffix}`);
        try {
            require('dotenv').config({ path: envFile, override: false });
        } catch {}  // file may not exist
    });

// Validate required variables
const schema = {
    NODE_ENV:       { required: true,  default: 'development' },
    PORT:           { required: false, default: '3000', type: 'number' },
    MONGO_URI:      { required: true },
    JWT_SECRET:     { required: true,  minLength: 32 },
    REFRESH_SECRET: { required: true,  minLength: 32 },
    REDIS_URL:      { required: false, default: 'redis://localhost:6379' },
    LOG_LEVEL:      { required: false, default: 'info' },
    CORS_ORIGINS:   { required: false, default: 'http://localhost:4200' },
};

const errors = [];
const env    = {};

for (const [key, rules] of Object.entries(schema)) {
    const raw = process.env[key] ?? rules.default;

    if (rules.required && !raw) {
        errors.push(`Missing required: ${key}`);
        continue;
    }

    if (rules.minLength && raw && raw.length < rules.minLength) {
        errors.push(`${key} must be at least ${rules.minLength} characters`);
    }

    env[key] = rules.type === 'number' ? parseInt(raw, 10) : raw;
}

if (errors.length) {
    console.error('Environment configuration errors:\n' + errors.join('\n'));
    process.exit(1);
}

module.exports = env;

// ── src/server.js — production application lifecycle ─────────────────────
const mongoose   = require('mongoose');
const app        = require('./app');
const env        = require('./config/env');
const { logger } = require('./config/logger');
const { getRedisClient } = require('./config/redis');

// Application state machine
const state = {
    isStarting:    true,
    isReady:       false,
    isShuttingDown:false,
    startTime:     Date.now(),
};

// ── Health endpoints ─────────────────────────────────────────────────────
// Liveness — is the process running?
app.get('/health/live', (req, res) => {
    if (state.isShuttingDown) return res.status(503).json({ status: 'shutting_down' });
    res.json({ status: 'alive', uptime: process.uptime() });
});

// Readiness — is the process ready to serve traffic?
app.get('/health/ready', async (req, res) => {
    if (!state.isReady || state.isShuttingDown) {
        return res.status(503).json({ status: 'not_ready' });
    }
    res.json({ status: 'ready', uptime: process.uptime() });
});

// Full health — dependency status
app.get('/health', async (req, res) => {
    const mongoReady = mongoose.connection.readyState === 1;
    let   redisReady = false;

    try {
        const redis = await getRedisClient();
        await redis.ping();
        redisReady = true;
    } catch {}

    const healthy = mongoReady && redisReady;
    res.status(healthy ? 200 : 503).json({
        status:    healthy ? 'healthy' : 'degraded',
        version:   process.env.npm_package_version,
        uptime:    process.uptime(),
        timestamp: new Date().toISOString(),
        dependencies: {
            mongodb: { status: mongoReady ? 'connected' : 'disconnected' },
            redis:   { status: redisReady ? 'connected' : 'disconnected' },
        },
    });
});

// ── Graceful shutdown ────────────────────────────────────────────────────
let httpServer;
const SHUTDOWN_TIMEOUT = 10_000;

async function shutdown(signal) {
    if (state.isShuttingDown) return;
    state.isShuttingDown = true;
    state.isReady        = false;  // stop readiness probe → LB stops routing
    logger.info(`${signal} received — starting graceful shutdown`);

    // Phase 1: stop new connections
    httpServer.close(async () => {
        logger.info('HTTP server closed');

        // Phase 2: drain connections and close dependencies
        try {
            await mongoose.disconnect();
            const redis = await getRedisClient();
            await redis.quit();
            logger.info('All connections closed — exiting cleanly');
            process.exit(0);
        } catch (err) {
            logger.error('Error during shutdown:', { error: err.message });
            process.exit(1);
        }
    });

    setTimeout(() => {
        logger.error('Graceful shutdown timed out — forcing exit');
        process.exit(1);
    }, SHUTDOWN_TIMEOUT);
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT',  () => shutdown('SIGINT'));

// ── Startup sequence ─────────────────────────────────────────────────────
async function start() {
    logger.info('Starting application...', { env: env.NODE_ENV, pid: process.pid });

    // 1. Connect to MongoDB
    await mongoose.connect(env.MONGO_URI, { maxPoolSize: 10, serverSelectionTimeoutMS: 5000 });
    logger.info('MongoDB connected');

    // 2. Connect to Redis
    await getRedisClient();
    logger.info('Redis connected');

    // 3. Start HTTP server
    httpServer = app.listen(env.PORT, () => {
        state.isStarting = false;
        state.isReady    = true;    // ← signal readiness probe
        logger.info(`Server ready on port ${env.PORT}`, {
            pid:  process.pid,
            port: env.PORT,
        });

        // Signal PM2 that we are ready (for zero-downtime reload)
        process.send?.('ready');
    });

    httpServer.on('error', err => {
        logger.error('HTTP server error', { error: err.message });
        if (err.code === 'EADDRINUSE') process.exit(1);
    });
}

start().catch(err => {
    logger.error('Failed to start', { error: err.message, stack: err.stack });
    process.exit(1);
});

How It Works

Step 1 — Environment Validation at Startup Catches Misconfiguration Early

Validating all required environment variables before starting the server catches misconfiguration at deployment time — not at the moment a route handler first uses a missing value. The validation schema also documents all configuration options in one place and enforces minimum security requirements (JWT secrets must be at least 32 characters). Exit code 1 ensures the deployment pipeline recognises the failure.

Step 2 — State Machine Controls Health Probe Responses

The application state (isStarting, isReady, isShuttingDown) drives the health endpoint responses. During startup: readiness returns 503 (load balancer does not route). After all connections: readiness returns 200. On SIGTERM: readiness immediately returns 503 (load balancer stops routing) before server.close() begins. This precise ordering ensures zero requests are dropped during deployments — the load balancer stops sending traffic before the server stops accepting connections.

Step 3 — Ordered Startup Prevents Race Conditions

Starting the HTTP server before establishing database connections means requests can arrive before the application is ready to handle them. The startup sequence ensures: (1) MongoDB connects, (2) Redis connects, (3) HTTP server starts. Only after all dependencies are ready does the server begin listening. The isReady flag in the health endpoint provides a safety net even if deployment infrastructure does not use readiness probes.

Step 4 — process.send(‘ready’) Integrates with PM2

PM2’s wait_ready: true configuration makes PM2 wait for the process to send 'ready' before considering the deployment complete. Sending it in the server.listen() callback means PM2 only marks the new instance as ready after the HTTP server is successfully bound. For zero-downtime reload (pm2 reload app), PM2 starts the new process, waits for ‘ready’, routes traffic to it, then sends SIGTERM to the old process.

Step 5 — server.close() Drains Without Dropping

Node.js’s server.close(callback) stops the server from accepting new TCP connections. Existing connections — including keep-alive HTTP connections — remain open until clients close them. The callback fires when the last connection is closed. For long-lived keep-alive connections (browsers typically keep connections open for 5–120 seconds), server.closeIdleConnections() (Node.js 18.2+) closes idle connections while active requests finish. This drains the server gracefully in seconds rather than minutes.

Quick Reference

Task Code
Validate env schema Check required keys, types, min lengths at startup
Liveness probe GET /health/live → 200 if not shutting down
Readiness probe GET /health/ready → 200 only when isReady && !isShuttingDown
Set ready after listen server.listen(port, () => { state.isReady = true; process.send?.('ready'); })
Stop on SIGTERM state.isReady = false; server.close(...)
Force exit timeout setTimeout(() => process.exit(1), 10_000)
PM2 ready signal process.send?.('ready') + wait_ready: true in ecosystem.config.js

🧠 Test Yourself

During a rolling deployment, a new pod starts and passes its liveness probe (200) but its readiness probe returns 503 (still connecting to MongoDB). What does Kubernetes do with this pod?