Caching Best Practices — TTL, Stampedes and Consistency

Production caching requires more than just adding cache calls — it requires thoughtful TTL design, protection against cache stampedes, graceful degradation when the cache is down, and continuous measurement of cache effectiveness. These best practices are what separate a fragile cache that causes intermittent 500 errors from a robust caching layer that reliably delivers 10× performance improvement under load.

Cache Stampede Prevention

// ── Cache stampede: when a hot key expires, many requests hit DB simultaneously ──
// Solution 1: SemaphoreSlim — only one request rebuilds, others wait

public class StampedeProtectedCacheService(
    IMemoryCache cache,
    IPostRepository repo)
{
    private readonly SemaphoreSlim _lock = new(1, 1);

    public async Task<PostDto?> GetByIdAsync(int id, CancellationToken ct)
    {
        var key = $"post:{id}";

        if (cache.TryGetValue(key, out PostDto? cached))
            return cached;

        await _lock.WaitAsync(ct);
        try
        {
            // Double-check after acquiring lock (another thread may have populated)
            if (cache.TryGetValue(key, out cached))
                return cached;

            var post = await repo.GetByIdAsync(id, ct);
            var dto  = post?.ToDto();

            if (dto is not null)
                cache.Set(key, dto, TimeSpan.FromMinutes(10));

            return dto;
        }
        finally
        {
            _lock.Release();
        }
    }
}

// ── Solution 2: Probabilistic early expiration (background refresh) ────────
// Instead of waiting for expiry, probabilistically refresh slightly before expiry
// Prevents the sharp cliff where all simultaneous users hit a miss at once
public async Task<T?> GetWithEarlyRefreshAsync<T>(
    string key, Func<Task<T?>> factory,
    TimeSpan ttl, CancellationToken ct)
{
    if (cache.TryGetValue(key, out (T? Value, DateTime ExpiresAt) entry))
    {
        var remaining = (entry.ExpiresAt - DateTime.UtcNow).TotalSeconds;
        var beta      = 1.0;  // tune: higher = refresh earlier
        // Randomly refresh when remaining time is less than -beta * ln(random)
        var shouldRefresh = remaining < -beta * Math.Log(Random.Shared.NextDouble());

        if (!shouldRefresh) return entry.Value;
    }

    var freshValue = await factory();
    cache.Set(key, (freshValue, DateTime.UtcNow.Add(ttl)), ttl);
    return freshValue;
}
Note: The SemaphoreSlim double-check pattern is critical: after acquiring the lock, always re-check the cache before rebuilding. The thread that waited for the semaphore should find the cache already populated (the thread that held the semaphore just rebuilt it). Without the second check, every queued thread would rebuild the cache, defeating the stampede prevention. The second check is what makes the pattern efficient: only the first thread does the database work; all waiting threads get the cached result.
Tip: Measure cache effectiveness with a hit ratio metric: cacheHits / (cacheHits + cacheMisses). A well-tuned cache should achieve 80%+ hit ratio for frequently accessed data. If the ratio drops significantly, it indicates either: the TTL is too short (data expires before being reused), cache keys are not granular enough (too many unique keys), or data patterns have changed. Expose hit ratio as a metric endpoint or push it to Application Insights for alerting when it drops below threshold.
Warning: Implement a circuit breaker for your cache tier. If Redis has a network partition and every cache call takes 5 seconds before timing out, your API’s latency becomes 5 seconds for every request that tries to read from cache. A circuit breaker opens after N failures and bypasses Redis entirely (falling back to the database) until the circuit closes after a recovery window. Polly provides a ready-made circuit breaker: Policy.Handle<RedisException>().CircuitBreakerAsync(5, TimeSpan.FromSeconds(30)).

Layered Caching Architecture (L1 + L2)

// ── L1 (in-process IMemoryCache) + L2 (Redis IDistributedCache) ────────────
public class LayeredCacheService(IMemoryCache l1, IRedisCacheService l2) : ICacheService
{
    public async Task<T?> GetOrCreateAsync<T>(
        string key, Func<Task<T?>> factory,
        TimeSpan? l1Ttl = null, TimeSpan? l2Ttl = null,
        CancellationToken ct = default)
    {
        // Check L1 (in-process, ~nanoseconds)
        if (l1.TryGetValue(key, out T? l1Value)) return l1Value;

        // Check L2 (Redis, ~milliseconds)
        var l2Value = await l2.GetAsync<T>(key, ct);
        if (l2Value is not null)
        {
            // Populate L1 from L2 (short TTL — L1 is ephemeral)
            l1.Set(key, l2Value, l1Ttl ?? TimeSpan.FromSeconds(30));
            return l2Value;
        }

        // L1 and L2 miss — load from database (~milliseconds to seconds)
        var value = await factory();
        if (value is not null)
        {
            l2.SetAsync(key, value, l2Ttl ?? TimeSpan.FromMinutes(5), ct: ct);
            l1.Set(key, value, l1Ttl ?? TimeSpan.FromSeconds(30));
        }
        return value;
    }
}
// Typical hit rates: L1 ~40%, L2 ~50%, DB ~10%
// Requests served without hitting DB: 90%

Common Mistakes

Mistake 1 — No circuit breaker on Redis (slow Redis makes entire API slow)

❌ Wrong — Redis timeout of 5s; every request waits 5s on each cache miss; latency spikes.

✅ Correct — use Polly circuit breaker; open circuit after N Redis failures; fall back to database.

Mistake 2 — Same TTL for all data types (hot data expires too early, cold data cached too long)

❌ Wrong — all cache entries have 5-minute TTL; frequently updated user feeds and static category lists have same TTL.

✅ Correct — tune TTL per data type: config (1 hour), published posts (10 min), search results (1 min), user feeds (30 sec).

🧠 Test Yourself

A layered cache has L1 (30s TTL) and L2 Redis (10min TTL). 100 concurrent users request the same post. The L1 cache is cold (just expired). What happens?