A MEAN Stack API that works correctly but responds in 2 seconds is unusable. Performance is a feature, and optimising it requires understanding where time is actually spent. The three highest-impact performance techniques for Express APIs are: compression (reducing response payload size), caching (avoiding repeated computation and database queries), and connection pooling (reusing database connections rather than creating new ones per request). Together these can reduce response times by 10–100x for typical API workloads without changing a single business logic function.
Performance Optimization Layers
| Layer | Technique | Reduces | Impact |
|---|---|---|---|
| Network | gzip/Brotli compression | Response body size | 50-90% smaller payloads |
| Application | In-memory caching (node-cache) | CPU for repeated computation | Microseconds vs milliseconds |
| HTTP | HTTP cache headers (ETag, Cache-Control) | Repeated identical requests | Zero server work for cached responses |
| External cache | Redis caching | Database queries for hot data | 1-2ms vs 20-50ms for DB |
| Database | Connection pooling | Connection overhead per request | Eliminates 50-200ms connection time |
| Database | Indexes | Documents scanned per query | 10-1000x faster queries |
| Express | Avoid sync operations | Event loop blocking time | Eliminates freezes under load |
HTTP Cache-Control Directives
| Directive | Meaning | Use For |
|---|---|---|
no-store |
Never cache — fetch fresh every time | Sensitive data, authenticated responses |
no-cache |
Can cache but must revalidate before use | Frequently changing data |
private |
Only browser can cache — no CDN/proxy | User-specific responses |
public |
CDN and browser can cache | Public, shared content |
max-age=N |
Cache for N seconds | Static assets, public data |
s-maxage=N |
CDN cache duration (overrides max-age) | CDN-served public content |
must-revalidate |
Must check server if stale | Critical accuracy required |
stale-while-revalidate=N |
Serve stale while revalidating in background | Performance with eventual freshness |
Cache-Control: private tells CDNs not to cache the response. no-store tells the browser not to cache at all — use this for sensitive data like authentication tokens, financial records, or personal health information. Public, non-authenticated data (product listings, articles, public stats) benefits enormously from CDN caching.{ maxPoolSize: 20 } in the connection options. Setting it too high wastes MongoDB server resources — each connection consumes memory on the MongoDB server. Profile your actual concurrent query count under load to choose the right value.Complete Performance Stack
// npm install compression node-cache ioredis
// ── 1. Compression middleware ─────────────────────────────────────────────
const compression = require('compression');
app.use(compression({
// Only compress responses larger than 1KB
threshold: 1024,
// Compression level: 0 = no compression, 9 = max (default 6)
level: 6,
// Custom filter — skip compression for streaming responses
filter: (req, res) => {
if (req.headers['x-no-compression']) return false;
return compression.filter(req, res);
},
}));
// Result: typical JSON API response shrinks from 50KB to 5KB
// Especially valuable for list endpoints with many records
// ── 2. Redis cache service ────────────────────────────────────────────────
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL || 'redis://localhost:6379');
class CacheService {
constructor(client) {
this.client = client;
this.prefix = 'cache:';
}
key(...parts) {
return this.prefix + parts.join(':');
}
async get(key) {
const value = await this.client.get(key);
return value ? JSON.parse(value) : null;
}
async set(key, value, ttlSeconds = 300) {
await this.client.setex(key, ttlSeconds, JSON.stringify(value));
}
async del(key) {
await this.client.del(key);
}
async delPattern(pattern) {
const keys = await this.client.keys(this.prefix + pattern);
if (keys.length) await this.client.del(...keys);
}
// Cache-aside pattern: check cache, fetch from DB on miss, populate cache
async remember(key, ttlSeconds, fetchFn) {
const cached = await this.get(key);
if (cached !== null) {
return { data: cached, fromCache: true };
}
const data = await fetchFn();
await this.set(key, data, ttlSeconds);
return { data, fromCache: false };
}
}
const cache = new CacheService(redis);
// ── 3. Caching in service layer ───────────────────────────────────────────
// services/task.service.js
async getStats(userId) {
const cacheKey = cache.key('stats', userId);
const { data } = await cache.remember(cacheKey, 60, async () => {
// Only runs on cache miss (first call or after TTL)
const counts = await taskRepository.countByStatus(userId);
return counts.reduce((acc, { _id, count }) => ({ ...acc, [_id]: count }), {});
});
return data;
}
// Invalidate cache when tasks change
async createTask(userId, taskData) {
const task = await taskRepository.create({ ...taskData, user: userId });
await cache.del(cache.key('stats', userId)); // invalidate stats cache
return task;
}
async deleteTask(taskId, userId) {
const task = await taskRepository.delete(taskId, userId);
if (!task) throw new NotFoundError('Task not found');
await cache.del(cache.key('stats', userId));
return task;
}
// ── 4. HTTP Cache-Control headers ─────────────────────────────────────────
// Middleware for setting cache headers
function cacheControl(options = {}) {
return (req, res, next) => {
if (req.method !== 'GET' && req.method !== 'HEAD') {
res.set('Cache-Control', 'no-store');
return next();
}
const { maxAge = 0, private: isPrivate = true, noStore = false } = options;
if (noStore) {
res.set('Cache-Control', 'no-store');
} else if (isPrivate) {
res.set('Cache-Control', `private, max-age=${maxAge}`);
} else {
res.set('Cache-Control', `public, max-age=${maxAge}, s-maxage=${maxAge * 2}`);
}
next();
};
}
// Apply to routes:
// Authenticated user data — private, no long-term caching
router.get('/tasks', cacheControl({ private: true, maxAge: 0 }), controller.getAll);
// Public stats — cache for 5 minutes
router.get('/stats', cacheControl({ private: false, maxAge: 300 }), controller.getPublicStats);
// User uploads — immutable (hash in filename)
app.use('/uploads', cacheControl({ private: false, maxAge: 31536000 }), express.static('uploads'));
// ── 5. ETag support for conditional requests ──────────────────────────────
const crypto = require('crypto');
function addETag(data) {
return (req, res, next) => {
const etag = '"' + crypto.createHash('md5').update(JSON.stringify(data)).digest('hex') + '"';
const ifNoneMatch = req.headers['if-none-match'];
if (ifNoneMatch === etag) {
return res.status(304).end(); // Not Modified — client uses cache
}
res.set('ETag', etag);
next();
};
}
// ── 6. MongoDB connection pool configuration ──────────────────────────────
// config/database.js
const mongoose = require('mongoose');
async function connectDB() {
const options = {
maxPoolSize: 20, // max 20 simultaneous queries
minPoolSize: 5, // keep 5 connections warm
maxIdleTimeMS: 60000, // close idle connections after 1 min
serverSelectionTimeoutMS: 5000, // fail fast if MongoDB unavailable
socketTimeoutMS: 45000,
connectTimeoutMS: 10000,
heartbeatFrequencyMS: 10000, // check connection health every 10s
retryWrites: true,
writeConcern: { w: 'majority' },
};
mongoose.connection.on('connected', () => logger.info('MongoDB connected'));
mongoose.connection.on('error', (err) => logger.error('MongoDB error', { err }));
mongoose.connection.on('disconnected', () => logger.warn('MongoDB disconnected'));
await mongoose.connect(process.env.MONGODB_URI, options);
logger.info('MongoDB pool established', { maxPoolSize: options.maxPoolSize });
}
module.exports = connectDB;
// ── 7. Response time monitoring middleware ────────────────────────────────
function responseTimeMiddleware(req, res, next) {
const start = process.hrtime.bigint();
res.on('finish', () => {
const duration = Number(process.hrtime.bigint() - start) / 1_000_000; // ns to ms
res.setHeader('X-Response-Time', `${duration.toFixed(2)}ms`);
if (duration > 1000) {
logger.warn('Slow request', {
method: req.method,
path: req.path,
duration: `${duration.toFixed(0)}ms`,
userId: req.user?.id,
});
}
});
next();
}
How It Works
Step 1 — Compression Reduces Bytes Over the Wire
gzip compression transforms a 50KB JSON response into approximately 5KB by replacing repeated patterns with compact references. The client decompresses it in under 1ms. For JSON APIs, compression ratios of 10:1 are common because JSON has many repeated field names and structural characters. The CPU cost of compression (a few milliseconds) is almost always worth the network savings, especially for mobile clients or slow connections.
Step 2 — Cache-Aside Pattern: Check, Fetch, Store
The most common caching strategy: check if the data is in Redis. If it is, return it immediately (1-2ms). If not, fetch from MongoDB (20-50ms), store the result in Redis with a TTL, then return it. The TTL ensures stale data eventually expires. The invalidation on mutation (deleting the cache key when data changes) ensures freshness for write operations. This pattern is simple to implement and effective for read-heavy endpoints.
Step 3 — ETags Enable Efficient Conditional Requests
When a client receives a response with an ETag header, it stores the tag with the cached response. On the next request for the same resource, it sends If-None-Match: "etag-value". The server computes the ETag of the current data and compares — if they match, returns 304 Not Modified with no body (a few bytes). The client uses its cached copy. This is most valuable for large, rarely-changing responses like user profiles or configuration data.
Step 4 — Connection Pooling Eliminates Per-Request Connection Overhead
Creating a new TCP connection to MongoDB for each request takes 50-200ms. A connection pool maintains a set of pre-established connections. When a request needs to query MongoDB, it borrows a connection from the pool (microseconds), executes the query, and returns the connection. The pool ensures connections are reused rather than recreated, eliminating connection establishment overhead from every request’s critical path.
Step 5 — Response Time Monitoring Identifies Bottlenecks
Logging response times with X-Response-Time headers and alerting on slow requests creates a feedback loop. When a new code path or query runs slowly under production load, the slow request log reveals it within minutes. Combined with MongoDB’s explain plans and Node.js profiling tools, response time monitoring is the first step in every performance investigation.
Real-World Example: Cached Stats Endpoint
// Before optimisation: 85ms average response time
// After: 3ms from cache (1.8ms from Redis), 45ms on cache miss
// GET /api/v1/tasks/stats
exports.getStats = asyncHandler(async (req, res) => {
const userId = req.user.id;
const cacheKey = `stats:${userId}`;
// Check Redis first
const cached = await cache.get(cacheKey);
if (cached) {
res.set('X-Cache', 'HIT');
return res.json({ success: true, data: cached });
}
// Compute stats from MongoDB (expensive aggregation)
const [statusCounts, overdue, dueToday] = await Promise.all([
Task.aggregate([
{ $match: { user: userId, deletedAt: { $exists: false } } },
{ $group: { _id: '$status', count: { $sum: 1 } } },
]),
Task.countDocuments({ user: userId, dueDate: { $lt: new Date() }, status: { $ne: 'completed' } }),
Task.countDocuments({ user: userId, dueDate: {
$gte: new Date(new Date().setHours(0,0,0,0)),
$lt: new Date(new Date().setHours(24,0,0,0)),
}}),
]);
const stats = { overdue, dueToday };
statusCounts.forEach(({ _id, count }) => { stats[_id] = count; });
// Cache for 60 seconds
await cache.set(cacheKey, stats, 60);
res.set('X-Cache', 'MISS');
res.json({ success: true, data: stats });
});
Common Mistakes
Mistake 1 — Caching authenticated, user-specific data publicly
❌ Wrong — User A’s tasks cached and returned to User B:
const cacheKey = 'tasks:all'; // not scoped to user!
const cached = await cache.get(cacheKey);
// User B requests tasks and receives User A's cached tasks
✅ Correct — always include userId in the cache key:
const cacheKey = `tasks:${req.user.id}:${JSON.stringify(req.query)}`;
Mistake 2 — Never invalidating the cache after mutations
❌ Wrong — cache returns stale data after a task is deleted:
async deleteTask(taskId, userId) {
await taskRepository.delete(taskId, userId);
// No cache invalidation — stats cache still shows the deleted task
}
✅ Correct — invalidate related cache keys after mutations:
async deleteTask(taskId, userId) {
await taskRepository.delete(taskId, userId);
await cache.del(`stats:${userId}`);
await cache.del(`tasks:${userId}:*`); // invalidate all task list caches for this user
}
Mistake 3 — Setting maxPoolSize too low for concurrent load
❌ Wrong — pool of 5 bottlenecks under 50 concurrent requests:
mongoose.connect(uri); // default maxPoolSize = 5
// Under load: 45 requests queue waiting for a connection
// Response times spike to 500ms+ even though queries take 10ms
✅ Correct — set pool size based on expected concurrent queries:
mongoose.connect(uri, { maxPoolSize: 20 });
// 20 concurrent queries — handles burst traffic without queueing
Quick Reference
| Technique | Implementation | Typical Gain |
|---|---|---|
| Compression | app.use(compression()) |
50-90% smaller responses |
| Redis cache | await cache.remember(key, ttl, fetchFn) |
1-2ms vs 20-50ms |
| HTTP cache headers | res.set('Cache-Control', 'private, max-age=60') |
Zero server work for cached responses |
| ETag | res.set('ETag', hash); if (match) res.status(304).end() |
Zero body transfer for unchanged data |
| Connection pool | mongoose.connect(uri, { maxPoolSize: 20 }) |
Eliminates 50-200ms per request |
| Lean queries | Model.find().lean() |
20x faster document creation |
| Projection | .select('title status createdAt') |
Smaller documents, less network |
| Parallel queries | await Promise.all([query1, query2]) |
max(t1,t2) vs t1+t2 |