Express Performance — Compression, Caching, and Connection Pooling

A MEAN Stack API that works correctly but responds in 2 seconds is unusable. Performance is a feature, and optimising it requires understanding where time is actually spent. The three highest-impact performance techniques for Express APIs are: compression (reducing response payload size), caching (avoiding repeated computation and database queries), and connection pooling (reusing database connections rather than creating new ones per request). Together these can reduce response times by 10–100x for typical API workloads without changing a single business logic function.

Performance Optimization Layers

Layer Technique Reduces Impact
Network gzip/Brotli compression Response body size 50-90% smaller payloads
Application In-memory caching (node-cache) CPU for repeated computation Microseconds vs milliseconds
HTTP HTTP cache headers (ETag, Cache-Control) Repeated identical requests Zero server work for cached responses
External cache Redis caching Database queries for hot data 1-2ms vs 20-50ms for DB
Database Connection pooling Connection overhead per request Eliminates 50-200ms connection time
Database Indexes Documents scanned per query 10-1000x faster queries
Express Avoid sync operations Event loop blocking time Eliminates freezes under load

HTTP Cache-Control Directives

Directive Meaning Use For
no-store Never cache — fetch fresh every time Sensitive data, authenticated responses
no-cache Can cache but must revalidate before use Frequently changing data
private Only browser can cache — no CDN/proxy User-specific responses
public CDN and browser can cache Public, shared content
max-age=N Cache for N seconds Static assets, public data
s-maxage=N CDN cache duration (overrides max-age) CDN-served public content
must-revalidate Must check server if stale Critical accuracy required
stale-while-revalidate=N Serve stale while revalidating in background Performance with eventual freshness
Note: Never cache authenticated, user-specific API responses in a shared cache (CDN or reverse proxy). Cache-Control: private tells CDNs not to cache the response. no-store tells the browser not to cache at all — use this for sensitive data like authentication tokens, financial records, or personal health information. Public, non-authenticated data (product listings, articles, public stats) benefits enormously from CDN caching.
Tip: Redis caching is most valuable for queries that: (1) are called frequently, (2) return data that changes infrequently, and (3) are expensive to compute. Task statistics, user profiles, configuration data, and product catalogues are good candidates. Individual task documents are not — they change frequently and are user-specific. Always set a TTL (time-to-live) on cached data and provide a mechanism to invalidate the cache when the underlying data changes.
Warning: Mongoose’s default connection pool size is 5. Under load, if 5 connections are all actively running queries, the 6th query waits in a queue until one completes. For APIs under significant load, increase the pool size with { maxPoolSize: 20 } in the connection options. Setting it too high wastes MongoDB server resources — each connection consumes memory on the MongoDB server. Profile your actual concurrent query count under load to choose the right value.

Complete Performance Stack

// npm install compression node-cache ioredis

// ── 1. Compression middleware ─────────────────────────────────────────────
const compression = require('compression');

app.use(compression({
    // Only compress responses larger than 1KB
    threshold: 1024,

    // Compression level: 0 = no compression, 9 = max (default 6)
    level: 6,

    // Custom filter — skip compression for streaming responses
    filter: (req, res) => {
        if (req.headers['x-no-compression']) return false;
        return compression.filter(req, res);
    },
}));

// Result: typical JSON API response shrinks from 50KB to 5KB
// Especially valuable for list endpoints with many records

// ── 2. Redis cache service ────────────────────────────────────────────────
const Redis  = require('ioredis');
const redis  = new Redis(process.env.REDIS_URL || 'redis://localhost:6379');

class CacheService {
    constructor(client) {
        this.client = client;
        this.prefix = 'cache:';
    }

    key(...parts) {
        return this.prefix + parts.join(':');
    }

    async get(key) {
        const value = await this.client.get(key);
        return value ? JSON.parse(value) : null;
    }

    async set(key, value, ttlSeconds = 300) {
        await this.client.setex(key, ttlSeconds, JSON.stringify(value));
    }

    async del(key) {
        await this.client.del(key);
    }

    async delPattern(pattern) {
        const keys = await this.client.keys(this.prefix + pattern);
        if (keys.length) await this.client.del(...keys);
    }

    // Cache-aside pattern: check cache, fetch from DB on miss, populate cache
    async remember(key, ttlSeconds, fetchFn) {
        const cached = await this.get(key);
        if (cached !== null) {
            return { data: cached, fromCache: true };
        }
        const data = await fetchFn();
        await this.set(key, data, ttlSeconds);
        return { data, fromCache: false };
    }
}

const cache = new CacheService(redis);

// ── 3. Caching in service layer ───────────────────────────────────────────
// services/task.service.js

async getStats(userId) {
    const cacheKey = cache.key('stats', userId);
    const { data } = await cache.remember(cacheKey, 60, async () => {
        // Only runs on cache miss (first call or after TTL)
        const counts = await taskRepository.countByStatus(userId);
        return counts.reduce((acc, { _id, count }) => ({ ...acc, [_id]: count }), {});
    });
    return data;
}

// Invalidate cache when tasks change
async createTask(userId, taskData) {
    const task = await taskRepository.create({ ...taskData, user: userId });
    await cache.del(cache.key('stats', userId));   // invalidate stats cache
    return task;
}

async deleteTask(taskId, userId) {
    const task = await taskRepository.delete(taskId, userId);
    if (!task) throw new NotFoundError('Task not found');
    await cache.del(cache.key('stats', userId));
    return task;
}

// ── 4. HTTP Cache-Control headers ─────────────────────────────────────────
// Middleware for setting cache headers
function cacheControl(options = {}) {
    return (req, res, next) => {
        if (req.method !== 'GET' && req.method !== 'HEAD') {
            res.set('Cache-Control', 'no-store');
            return next();
        }

        const { maxAge = 0, private: isPrivate = true, noStore = false } = options;

        if (noStore) {
            res.set('Cache-Control', 'no-store');
        } else if (isPrivate) {
            res.set('Cache-Control', `private, max-age=${maxAge}`);
        } else {
            res.set('Cache-Control', `public, max-age=${maxAge}, s-maxage=${maxAge * 2}`);
        }
        next();
    };
}

// Apply to routes:
// Authenticated user data — private, no long-term caching
router.get('/tasks',      cacheControl({ private: true, maxAge: 0 }), controller.getAll);
// Public stats — cache for 5 minutes
router.get('/stats',      cacheControl({ private: false, maxAge: 300 }), controller.getPublicStats);
// User uploads — immutable (hash in filename)
app.use('/uploads',       cacheControl({ private: false, maxAge: 31536000 }), express.static('uploads'));

// ── 5. ETag support for conditional requests ──────────────────────────────
const crypto = require('crypto');

function addETag(data) {
    return (req, res, next) => {
        const etag     = '"' + crypto.createHash('md5').update(JSON.stringify(data)).digest('hex') + '"';
        const ifNoneMatch = req.headers['if-none-match'];

        if (ifNoneMatch === etag) {
            return res.status(304).end();   // Not Modified — client uses cache
        }
        res.set('ETag', etag);
        next();
    };
}

// ── 6. MongoDB connection pool configuration ──────────────────────────────
// config/database.js
const mongoose = require('mongoose');

async function connectDB() {
    const options = {
        maxPoolSize:        20,    // max 20 simultaneous queries
        minPoolSize:         5,    // keep 5 connections warm
        maxIdleTimeMS:   60000,    // close idle connections after 1 min
        serverSelectionTimeoutMS: 5000,   // fail fast if MongoDB unavailable
        socketTimeoutMS:        45000,
        connectTimeoutMS:       10000,
        heartbeatFrequencyMS:   10000,    // check connection health every 10s
        retryWrites:            true,
        writeConcern:           { w: 'majority' },
    };

    mongoose.connection.on('connected',    () => logger.info('MongoDB connected'));
    mongoose.connection.on('error',        (err) => logger.error('MongoDB error', { err }));
    mongoose.connection.on('disconnected', () => logger.warn('MongoDB disconnected'));

    await mongoose.connect(process.env.MONGODB_URI, options);
    logger.info('MongoDB pool established', { maxPoolSize: options.maxPoolSize });
}

module.exports = connectDB;

// ── 7. Response time monitoring middleware ────────────────────────────────
function responseTimeMiddleware(req, res, next) {
    const start = process.hrtime.bigint();

    res.on('finish', () => {
        const duration = Number(process.hrtime.bigint() - start) / 1_000_000;  // ns to ms
        res.setHeader('X-Response-Time', `${duration.toFixed(2)}ms`);

        if (duration > 1000) {
            logger.warn('Slow request', {
                method:   req.method,
                path:     req.path,
                duration: `${duration.toFixed(0)}ms`,
                userId:   req.user?.id,
            });
        }
    });

    next();
}

How It Works

Step 1 — Compression Reduces Bytes Over the Wire

gzip compression transforms a 50KB JSON response into approximately 5KB by replacing repeated patterns with compact references. The client decompresses it in under 1ms. For JSON APIs, compression ratios of 10:1 are common because JSON has many repeated field names and structural characters. The CPU cost of compression (a few milliseconds) is almost always worth the network savings, especially for mobile clients or slow connections.

Step 2 — Cache-Aside Pattern: Check, Fetch, Store

The most common caching strategy: check if the data is in Redis. If it is, return it immediately (1-2ms). If not, fetch from MongoDB (20-50ms), store the result in Redis with a TTL, then return it. The TTL ensures stale data eventually expires. The invalidation on mutation (deleting the cache key when data changes) ensures freshness for write operations. This pattern is simple to implement and effective for read-heavy endpoints.

Step 3 — ETags Enable Efficient Conditional Requests

When a client receives a response with an ETag header, it stores the tag with the cached response. On the next request for the same resource, it sends If-None-Match: "etag-value". The server computes the ETag of the current data and compares — if they match, returns 304 Not Modified with no body (a few bytes). The client uses its cached copy. This is most valuable for large, rarely-changing responses like user profiles or configuration data.

Step 4 — Connection Pooling Eliminates Per-Request Connection Overhead

Creating a new TCP connection to MongoDB for each request takes 50-200ms. A connection pool maintains a set of pre-established connections. When a request needs to query MongoDB, it borrows a connection from the pool (microseconds), executes the query, and returns the connection. The pool ensures connections are reused rather than recreated, eliminating connection establishment overhead from every request’s critical path.

Step 5 — Response Time Monitoring Identifies Bottlenecks

Logging response times with X-Response-Time headers and alerting on slow requests creates a feedback loop. When a new code path or query runs slowly under production load, the slow request log reveals it within minutes. Combined with MongoDB’s explain plans and Node.js profiling tools, response time monitoring is the first step in every performance investigation.

Real-World Example: Cached Stats Endpoint

// Before optimisation: 85ms average response time
// After: 3ms from cache (1.8ms from Redis), 45ms on cache miss

// GET /api/v1/tasks/stats
exports.getStats = asyncHandler(async (req, res) => {
    const userId   = req.user.id;
    const cacheKey = `stats:${userId}`;

    // Check Redis first
    const cached = await cache.get(cacheKey);
    if (cached) {
        res.set('X-Cache', 'HIT');
        return res.json({ success: true, data: cached });
    }

    // Compute stats from MongoDB (expensive aggregation)
    const [statusCounts, overdue, dueToday] = await Promise.all([
        Task.aggregate([
            { $match: { user: userId, deletedAt: { $exists: false } } },
            { $group: { _id: '$status', count: { $sum: 1 } } },
        ]),
        Task.countDocuments({ user: userId, dueDate: { $lt: new Date() }, status: { $ne: 'completed' } }),
        Task.countDocuments({ user: userId, dueDate: {
            $gte: new Date(new Date().setHours(0,0,0,0)),
            $lt:  new Date(new Date().setHours(24,0,0,0)),
        }}),
    ]);

    const stats = { overdue, dueToday };
    statusCounts.forEach(({ _id, count }) => { stats[_id] = count; });

    // Cache for 60 seconds
    await cache.set(cacheKey, stats, 60);

    res.set('X-Cache', 'MISS');
    res.json({ success: true, data: stats });
});

Common Mistakes

Mistake 1 — Caching authenticated, user-specific data publicly

❌ Wrong — User A’s tasks cached and returned to User B:

const cacheKey = 'tasks:all';   // not scoped to user!
const cached   = await cache.get(cacheKey);
// User B requests tasks and receives User A's cached tasks

✅ Correct — always include userId in the cache key:

const cacheKey = `tasks:${req.user.id}:${JSON.stringify(req.query)}`;

Mistake 2 — Never invalidating the cache after mutations

❌ Wrong — cache returns stale data after a task is deleted:

async deleteTask(taskId, userId) {
    await taskRepository.delete(taskId, userId);
    // No cache invalidation — stats cache still shows the deleted task
}

✅ Correct — invalidate related cache keys after mutations:

async deleteTask(taskId, userId) {
    await taskRepository.delete(taskId, userId);
    await cache.del(`stats:${userId}`);
    await cache.del(`tasks:${userId}:*`);  // invalidate all task list caches for this user
}

Mistake 3 — Setting maxPoolSize too low for concurrent load

❌ Wrong — pool of 5 bottlenecks under 50 concurrent requests:

mongoose.connect(uri);  // default maxPoolSize = 5
// Under load: 45 requests queue waiting for a connection
// Response times spike to 500ms+ even though queries take 10ms

✅ Correct — set pool size based on expected concurrent queries:

mongoose.connect(uri, { maxPoolSize: 20 });
// 20 concurrent queries — handles burst traffic without queueing

Quick Reference

Technique Implementation Typical Gain
Compression app.use(compression()) 50-90% smaller responses
Redis cache await cache.remember(key, ttl, fetchFn) 1-2ms vs 20-50ms
HTTP cache headers res.set('Cache-Control', 'private, max-age=60') Zero server work for cached responses
ETag res.set('ETag', hash); if (match) res.status(304).end() Zero body transfer for unchanged data
Connection pool mongoose.connect(uri, { maxPoolSize: 20 }) Eliminates 50-200ms per request
Lean queries Model.find().lean() 20x faster document creation
Projection .select('title status createdAt') Smaller documents, less network
Parallel queries await Promise.all([query1, query2]) max(t1,t2) vs t1+t2

🧠 Test Yourself

The GET /api/v1/tasks/stats endpoint runs an expensive MongoDB aggregation. It is called 100 times per second. The stats only change when tasks are created or deleted. What is the most appropriate optimisation?