Express Performance — Compression, Caching, and Connection Pooling

A MEAN Stack API that works correctly but responds in 2 seconds is unusable. Performance is a feature, and optimising it requires understanding where time is actually spent. The three highest-impact performance techniques for Express APIs are: compression (reducing response payload size), caching (avoiding repeated computation and database queries), and connection pooling (reusing database connections rather than creating new ones per request). Together these can reduce response times by 10–100x for typical API workloads without changing a single business logic function.

Performance Optimization Layers

Layer	Technique	Reduces	Impact
Network	gzip/Brotli compression	Response body size	50-90% smaller payloads
Application	In-memory caching (node-cache)	CPU for repeated computation	Microseconds vs milliseconds
HTTP	HTTP cache headers (ETag, Cache-Control)	Repeated identical requests	Zero server work for cached responses
External cache	Redis caching	Database queries for hot data	1-2ms vs 20-50ms for DB
Database	Connection pooling	Connection overhead per request	Eliminates 50-200ms connection time
Database	Indexes	Documents scanned per query	10-1000x faster queries
Express	Avoid sync operations	Event loop blocking time	Eliminates freezes under load

HTTP Cache-Control Directives

Directive	Meaning	Use For
`no-store`	Never cache — fetch fresh every time	Sensitive data, authenticated responses
`no-cache`	Can cache but must revalidate before use	Frequently changing data
`private`	Only browser can cache — no CDN/proxy	User-specific responses
`public`	CDN and browser can cache	Public, shared content
`max-age=N`	Cache for N seconds	Static assets, public data
`s-maxage=N`	CDN cache duration (overrides max-age)	CDN-served public content
`must-revalidate`	Must check server if stale	Critical accuracy required
`stale-while-revalidate=N`	Serve stale while revalidating in background	Performance with eventual freshness

Note: Never cache authenticated, user-specific API responses in a shared cache (CDN or reverse proxy). Cache-Control: private tells CDNs not to cache the response. no-store tells the browser not to cache at all — use this for sensitive data like authentication tokens, financial records, or personal health information. Public, non-authenticated data (product listings, articles, public stats) benefits enormously from CDN caching.

Tip: Redis caching is most valuable for queries that: (1) are called frequently, (2) return data that changes infrequently, and (3) are expensive to compute. Task statistics, user profiles, configuration data, and product catalogues are good candidates. Individual task documents are not — they change frequently and are user-specific. Always set a TTL (time-to-live) on cached data and provide a mechanism to invalidate the cache when the underlying data changes.

Warning: Mongoose’s default connection pool size is 5. Under load, if 5 connections are all actively running queries, the 6th query waits in a queue until one completes. For APIs under significant load, increase the pool size with { maxPoolSize: 20 } in the connection options. Setting it too high wastes MongoDB server resources — each connection consumes memory on the MongoDB server. Profile your actual concurrent query count under load to choose the right value.

Complete Performance Stack

// npm install compression node-cache ioredis

// ── 1. Compression middleware ─────────────────────────────────────────────
const compression = require('compression');

app.use(compression({
    // Only compress responses larger than 1KB
    threshold: 1024,

    // Compression level: 0 = no compression, 9 = max (default 6)
    level: 6,

    // Custom filter — skip compression for streaming responses
    filter: (req, res) => {
        if (req.headers['x-no-compression']) return false;
        return compression.filter(req, res);
    },
}));

// Result: typical JSON API response shrinks from 50KB to 5KB
// Especially valuable for list endpoints with many records

// ── 2. Redis cache service ────────────────────────────────────────────────
const Redis  = require('ioredis');
const redis  = new Redis(process.env.REDIS_URL || 'redis://localhost:6379');

class CacheService {
    constructor(client) {
        this.client = client;
        this.prefix = 'cache:';
    }

    key(...parts) {
        return this.prefix + parts.join(':');
    }

    async get(key) {
        const value = await this.client.get(key);
        return value ? JSON.parse(value) : null;
    }

    async set(key, value, ttlSeconds = 300) {
        await this.client.setex(key, ttlSeconds, JSON.stringify(value));
    }

    async del(key) {
        await this.client.del(key);
    }

    async delPattern(pattern) {
        const keys = await this.client.keys(this.prefix + pattern);
        if (keys.length) await this.client.del(...keys);
    }

    // Cache-aside pattern: check cache, fetch from DB on miss, populate cache
    async remember(key, ttlSeconds, fetchFn) {
        const cached = await this.get(key);
        if (cached !== null) {
            return { data: cached, fromCache: true };
        }
        const data = await fetchFn();
        await this.set(key, data, ttlSeconds);
        return { data, fromCache: false };
    }
}

const cache = new CacheService(redis);

// ── 3. Caching in service layer ───────────────────────────────────────────
// services/task.service.js

async getStats(userId) {
    const cacheKey = cache.key('stats', userId);
    const { data } = await cache.remember(cacheKey, 60, async () => {
        // Only runs on cache miss (first call or after TTL)
        const counts = await taskRepository.countByStatus(userId);
        return counts.reduce((acc, { _id, count }) => ({ ...acc, [_id]: count }), {});
    });
    return data;
}

// Invalidate cache when tasks change
async createTask(userId, taskData) {
    const task = await taskRepository.create({ ...taskData, user: userId });
    await cache.del(cache.key('stats', userId));   // invalidate stats cache
    return task;
}

async deleteTask(taskId, userId) {
    const task = await taskRepository.delete(taskId, userId);
    if (!task) throw new NotFoundError('Task not found');
    await cache.del(cache.key('stats', userId));
    return task;
}

// ── 4. HTTP Cache-Control headers ─────────────────────────────────────────
// Middleware for setting cache headers
function cacheControl(options = {}) {
    return (req, res, next) => {
        if (req.method !== 'GET' && req.method !== 'HEAD') {
            res.set('Cache-Control', 'no-store');
            return next();
        }

        const { maxAge = 0, private: isPrivate = true, noStore = false } = options;

        if (noStore) {
            res.set('Cache-Control', 'no-store');
        } else if (isPrivate) {
            res.set('Cache-Control', `private, max-age=${maxAge}`);
        } else {
            res.set('Cache-Control', `public, max-age=${maxAge}, s-maxage=${maxAge * 2}`);
        }
        next();
    };
}

// Apply to routes:
// Authenticated user data — private, no long-term caching
router.get('/tasks',      cacheControl({ private: true, maxAge: 0 }), controller.getAll);
// Public stats — cache for 5 minutes
router.get('/stats',      cacheControl({ private: false, maxAge: 300 }), controller.getPublicStats);
// User uploads — immutable (hash in filename)
app.use('/uploads',       cacheControl({ private: false, maxAge: 31536000 }), express.static('uploads'));

// ── 5. ETag support for conditional requests ──────────────────────────────
const crypto = require('crypto');

function addETag(data) {
    return (req, res, next) => {
        const etag     = '"' + crypto.createHash('md5').update(JSON.stringify(data)).digest('hex') + '"';
        const ifNoneMatch = req.headers['if-none-match'];

        if (ifNoneMatch === etag) {
            return res.status(304).end();   // Not Modified — client uses cache
        }
        res.set('ETag', etag);
        next();
    };
}

// ── 6. MongoDB connection pool configuration ──────────────────────────────
// config/database.js
const mongoose = require('mongoose');

async function connectDB() {
    const options = {
        maxPoolSize:        20,    // max 20 simultaneous queries
        minPoolSize:         5,    // keep 5 connections warm
        maxIdleTimeMS:   60000,    // close idle connections after 1 min
        serverSelectionTimeoutMS: 5000,   // fail fast if MongoDB unavailable
        socketTimeoutMS:        45000,
        connectTimeoutMS:       10000,
        heartbeatFrequencyMS:   10000,    // check connection health every 10s
        retryWrites:            true,
        writeConcern:           { w: 'majority' },
    };

    mongoose.connection.on('connected',    () => logger.info('MongoDB connected'));
    mongoose.connection.on('error',        (err) => logger.error('MongoDB error', { err }));
    mongoose.connection.on('disconnected', () => logger.warn('MongoDB disconnected'));

    await mongoose.connect(process.env.MONGODB_URI, options);
    logger.info('MongoDB pool established', { maxPoolSize: options.maxPoolSize });
}

module.exports = connectDB;

// ── 7. Response time monitoring middleware ────────────────────────────────
function responseTimeMiddleware(req, res, next) {
    const start = process.hrtime.bigint();

    res.on('finish', () => {
        const duration = Number(process.hrtime.bigint() - start) / 1_000_000;  // ns to ms
        res.setHeader('X-Response-Time', `${duration.toFixed(2)}ms`);

        if (duration > 1000) {
            logger.warn('Slow request', {
                method:   req.method,
                path:     req.path,
                duration: `${duration.toFixed(0)}ms`,
                userId:   req.user?.id,
            });
        }
    });

    next();
}

How It Works

Step 1 — Compression Reduces Bytes Over the Wire

gzip compression transforms a 50KB JSON response into approximately 5KB by replacing repeated patterns with compact references. The client decompresses it in under 1ms. For JSON APIs, compression ratios of 10:1 are common because JSON has many repeated field names and structural characters. The CPU cost of compression (a few milliseconds) is almost always worth the network savings, especially for mobile clients or slow connections.

Step 2 — Cache-Aside Pattern: Check, Fetch, Store

The most common caching strategy: check if the data is in Redis. If it is, return it immediately (1-2ms). If not, fetch from MongoDB (20-50ms), store the result in Redis with a TTL, then return it. The TTL ensures stale data eventually expires. The invalidation on mutation (deleting the cache key when data changes) ensures freshness for write operations. This pattern is simple to implement and effective for read-heavy endpoints.

Step 3 — ETags Enable Efficient Conditional Requests

When a client receives a response with an ETag header, it stores the tag with the cached response. On the next request for the same resource, it sends If-None-Match: "etag-value". The server computes the ETag of the current data and compares — if they match, returns 304 Not Modified with no body (a few bytes). The client uses its cached copy. This is most valuable for large, rarely-changing responses like user profiles or configuration data.

Step 4 — Connection Pooling Eliminates Per-Request Connection Overhead

Creating a new TCP connection to MongoDB for each request takes 50-200ms. A connection pool maintains a set of pre-established connections. When a request needs to query MongoDB, it borrows a connection from the pool (microseconds), executes the query, and returns the connection. The pool ensures connections are reused rather than recreated, eliminating connection establishment overhead from every request’s critical path.

Step 5 — Response Time Monitoring Identifies Bottlenecks

Logging response times with X-Response-Time headers and alerting on slow requests creates a feedback loop. When a new code path or query runs slowly under production load, the slow request log reveals it within minutes. Combined with MongoDB’s explain plans and Node.js profiling tools, response time monitoring is the first step in every performance investigation.

Real-World Example: Cached Stats Endpoint

// Before optimisation: 85ms average response time
// After: 3ms from cache (1.8ms from Redis), 45ms on cache miss

// GET /api/v1/tasks/stats
exports.getStats = asyncHandler(async (req, res) => {
    const userId   = req.user.id;
    const cacheKey = `stats:${userId}`;

    // Check Redis first
    const cached = await cache.get(cacheKey);
    if (cached) {
        res.set('X-Cache', 'HIT');
        return res.json({ success: true, data: cached });
    }

    // Compute stats from MongoDB (expensive aggregation)
    const [statusCounts, overdue, dueToday] = await Promise.all([
        Task.aggregate([
            { $match: { user: userId, deletedAt: { $exists: false } } },
            { $group: { _id: '$status', count: { $sum: 1 } } },
        ]),
        Task.countDocuments({ user: userId, dueDate: { $lt: new Date() }, status: { $ne: 'completed' } }),
        Task.countDocuments({ user: userId, dueDate: {
            $gte: new Date(new Date().setHours(0,0,0,0)),
            $lt:  new Date(new Date().setHours(24,0,0,0)),
        }}),
    ]);

    const stats = { overdue, dueToday };
    statusCounts.forEach(({ _id, count }) => { stats[_id] = count; });

    // Cache for 60 seconds
    await cache.set(cacheKey, stats, 60);

    res.set('X-Cache', 'MISS');
    res.json({ success: true, data: stats });
});

Common Mistakes

Mistake 1 — Caching authenticated, user-specific data publicly

❌ Wrong — User A’s tasks cached and returned to User B:

const cacheKey = 'tasks:all';   // not scoped to user!
const cached   = await cache.get(cacheKey);
// User B requests tasks and receives User A's cached tasks

✅ Correct — always include userId in the cache key:

const cacheKey = `tasks:${req.user.id}:${JSON.stringify(req.query)}`;

Mistake 2 — Never invalidating the cache after mutations

❌ Wrong — cache returns stale data after a task is deleted:

async deleteTask(taskId, userId) {
    await taskRepository.delete(taskId, userId);
    // No cache invalidation — stats cache still shows the deleted task
}

✅ Correct — invalidate related cache keys after mutations:

async deleteTask(taskId, userId) {
    await taskRepository.delete(taskId, userId);
    await cache.del(`stats:${userId}`);
    await cache.del(`tasks:${userId}:*`);  // invalidate all task list caches for this user
}

Mistake 3 — Setting maxPoolSize too low for concurrent load

❌ Wrong — pool of 5 bottlenecks under 50 concurrent requests:

mongoose.connect(uri);  // default maxPoolSize = 5
// Under load: 45 requests queue waiting for a connection
// Response times spike to 500ms+ even though queries take 10ms

✅ Correct — set pool size based on expected concurrent queries:

mongoose.connect(uri, { maxPoolSize: 20 });
// 20 concurrent queries — handles burst traffic without queueing

Quick Reference

Technique	Implementation	Typical Gain
Compression	`app.use(compression())`	50-90% smaller responses
Redis cache	`await cache.remember(key, ttl, fetchFn)`	1-2ms vs 20-50ms
HTTP cache headers	`res.set('Cache-Control', 'private, max-age=60')`	Zero server work for cached responses
ETag	`res.set('ETag', hash); if (match) res.status(304).end()`	Zero body transfer for unchanged data
Connection pool	`mongoose.connect(uri, { maxPoolSize: 20 })`	Eliminates 50-200ms per request
Lean queries	`Model.find().lean()`	20x faster document creation
Projection	`.select('title status createdAt')`	Smaller documents, less network
Parallel queries	`await Promise.all([query1, query2])`	max(t1,t2) vs t1+t2

Performance Optimization Layers #

HTTP Cache-Control Directives #

Complete Performance Stack #

How It Works #

Step 1 — Compression Reduces Bytes Over the Wire #

Step 2 — Cache-Aside Pattern: Check, Fetch, Store #

Step 3 — ETags Enable Efficient Conditional Requests #

Step 4 — Connection Pooling Eliminates Per-Request Connection Overhead #

Step 5 — Response Time Monitoring Identifies Bottlenecks #

Real-World Example: Cached Stats Endpoint #

Common Mistakes #

Mistake 1 — Caching authenticated, user-specific data publicly #

Mistake 2 — Never invalidating the cache after mutations #

Mistake 3 — Setting maxPoolSize too low for concurrent load #

Quick Reference #

🧠 Test Yourself #

📚 More in this Tutorial Series

Performance Optimization Layers

HTTP Cache-Control Directives

Complete Performance Stack

How It Works

Step 1 — Compression Reduces Bytes Over the Wire

Step 2 — Cache-Aside Pattern: Check, Fetch, Store

Step 3 — ETags Enable Efficient Conditional Requests

Step 4 — Connection Pooling Eliminates Per-Request Connection Overhead

Step 5 — Response Time Monitoring Identifies Bottlenecks

Real-World Example: Cached Stats Endpoint

Common Mistakes

Mistake 1 — Caching authenticated, user-specific data publicly

Mistake 2 — Never invalidating the cache after mutations

Mistake 3 — Setting maxPoolSize too low for concurrent load

Quick Reference

🧠 Test Yourself