Production caching requires more than just adding cache calls — it requires thoughtful TTL design, protection against cache stampedes, graceful degradation when the cache is down, and continuous measurement of cache effectiveness. These best practices are what separate a fragile cache that causes intermittent 500 errors from a robust caching layer that reliably delivers 10× performance improvement under load.
Cache Stampede Prevention
// ── Cache stampede: when a hot key expires, many requests hit DB simultaneously ──
// Solution 1: SemaphoreSlim — only one request rebuilds, others wait
public class StampedeProtectedCacheService(
IMemoryCache cache,
IPostRepository repo)
{
private readonly SemaphoreSlim _lock = new(1, 1);
public async Task<PostDto?> GetByIdAsync(int id, CancellationToken ct)
{
var key = $"post:{id}";
if (cache.TryGetValue(key, out PostDto? cached))
return cached;
await _lock.WaitAsync(ct);
try
{
// Double-check after acquiring lock (another thread may have populated)
if (cache.TryGetValue(key, out cached))
return cached;
var post = await repo.GetByIdAsync(id, ct);
var dto = post?.ToDto();
if (dto is not null)
cache.Set(key, dto, TimeSpan.FromMinutes(10));
return dto;
}
finally
{
_lock.Release();
}
}
}
// ── Solution 2: Probabilistic early expiration (background refresh) ────────
// Instead of waiting for expiry, probabilistically refresh slightly before expiry
// Prevents the sharp cliff where all simultaneous users hit a miss at once
public async Task<T?> GetWithEarlyRefreshAsync<T>(
string key, Func<Task<T?>> factory,
TimeSpan ttl, CancellationToken ct)
{
if (cache.TryGetValue(key, out (T? Value, DateTime ExpiresAt) entry))
{
var remaining = (entry.ExpiresAt - DateTime.UtcNow).TotalSeconds;
var beta = 1.0; // tune: higher = refresh earlier
// Randomly refresh when remaining time is less than -beta * ln(random)
var shouldRefresh = remaining < -beta * Math.Log(Random.Shared.NextDouble());
if (!shouldRefresh) return entry.Value;
}
var freshValue = await factory();
cache.Set(key, (freshValue, DateTime.UtcNow.Add(ttl)), ttl);
return freshValue;
}
cacheHits / (cacheHits + cacheMisses). A well-tuned cache should achieve 80%+ hit ratio for frequently accessed data. If the ratio drops significantly, it indicates either: the TTL is too short (data expires before being reused), cache keys are not granular enough (too many unique keys), or data patterns have changed. Expose hit ratio as a metric endpoint or push it to Application Insights for alerting when it drops below threshold.Policy.Handle<RedisException>().CircuitBreakerAsync(5, TimeSpan.FromSeconds(30)).Layered Caching Architecture (L1 + L2)
// ── L1 (in-process IMemoryCache) + L2 (Redis IDistributedCache) ────────────
public class LayeredCacheService(IMemoryCache l1, IRedisCacheService l2) : ICacheService
{
public async Task<T?> GetOrCreateAsync<T>(
string key, Func<Task<T?>> factory,
TimeSpan? l1Ttl = null, TimeSpan? l2Ttl = null,
CancellationToken ct = default)
{
// Check L1 (in-process, ~nanoseconds)
if (l1.TryGetValue(key, out T? l1Value)) return l1Value;
// Check L2 (Redis, ~milliseconds)
var l2Value = await l2.GetAsync<T>(key, ct);
if (l2Value is not null)
{
// Populate L1 from L2 (short TTL — L1 is ephemeral)
l1.Set(key, l2Value, l1Ttl ?? TimeSpan.FromSeconds(30));
return l2Value;
}
// L1 and L2 miss — load from database (~milliseconds to seconds)
var value = await factory();
if (value is not null)
{
l2.SetAsync(key, value, l2Ttl ?? TimeSpan.FromMinutes(5), ct: ct);
l1.Set(key, value, l1Ttl ?? TimeSpan.FromSeconds(30));
}
return value;
}
}
// Typical hit rates: L1 ~40%, L2 ~50%, DB ~10%
// Requests served without hitting DB: 90%
Common Mistakes
Mistake 1 — No circuit breaker on Redis (slow Redis makes entire API slow)
❌ Wrong — Redis timeout of 5s; every request waits 5s on each cache miss; latency spikes.
✅ Correct — use Polly circuit breaker; open circuit after N Redis failures; fall back to database.
Mistake 2 — Same TTL for all data types (hot data expires too early, cold data cached too long)
❌ Wrong — all cache entries have 5-minute TTL; frequently updated user feeds and static category lists have same TTL.
✅ Correct — tune TTL per data type: config (1 hour), published posts (10 min), search results (1 min), user feeds (30 sec).