Rate Limiting — ASP.NET Core Built-In Policies and Per-User Limits

Rate limiting protects Web APIs from abuse — preventing a single client from overwhelming the server with requests, enabling fair use across all clients, and providing a first line of defence against brute-force and denial-of-service attacks. ASP.NET Core 7+ includes built-in rate limiting middleware with four algorithms that cover the spectrum from simple request counting to sophisticated traffic shaping. For the BlogApp API, a global rate limiter prevents abuse while per-user limiters enforce fair usage policies for authenticated clients.

Rate Limiting Configuration

// ── Program.cs — configure rate limiting ──────────────────────────────────
builder.Services.AddRateLimiter(opts =>
{
    // ── Global fallback policy (applies to all endpoints) ─────────────────
    opts.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(ctx =>
    {
        // Partition by authenticated user ID or by IP for anonymous requests
        var partitionKey = ctx.User.Identity?.IsAuthenticated == true
            ? ctx.User.FindFirstValue(ClaimTypes.NameIdentifier)!
            : ctx.Connection.RemoteIpAddress?.ToString() ?? "unknown";

        return RateLimitPartition.GetFixedWindowLimiter(partitionKey,
            _ => new FixedWindowRateLimiterOptions
            {
                PermitLimit         = 100,               // 100 requests...
                Window              = TimeSpan.FromMinutes(1), // ...per minute
                QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
                QueueLimit          = 0,                 // no queueing — reject immediately
            });
    });

    // ── Named policy: strict limit for auth endpoints (brute-force protection) ──
    opts.AddFixedWindowLimiter("AuthEndpoints", options =>
    {
        options.PermitLimit         = 5;
        options.Window              = TimeSpan.FromMinutes(15);
        options.QueueLimit          = 0;
    });

    // ── Named policy: token bucket for search (bursty traffic allowed) ────
    opts.AddTokenBucketLimiter("SearchEndpoints", options =>
    {
        options.TokenLimit          = 50;   // max burst of 50 requests
        options.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
        options.TokensPerPeriod     = 10;   // refill 10 tokens per 10s = 1/s avg
        options.QueueLimit          = 5;    // queue up to 5 requests
        options.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    });

    // ── Customise the 429 response ────────────────────────────────────────
    opts.OnRejected = async (ctx, token) =>
    {
        ctx.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
        ctx.HttpContext.Response.ContentType = "application/problem+json";

        var retryAfter = ctx.Lease.TryGetMetadata(MetadataName.RetryAfter, out var retry)
            ? (int)retry.TotalSeconds : 60;

        ctx.HttpContext.Response.Headers.RetryAfter = retryAfter.ToString();

        await ctx.HttpContext.Response.WriteAsJsonAsync(new ProblemDetails
        {
            Type     = "https://tools.ietf.org/html/rfc6585#section-4",
            Title    = "Too Many Requests",
            Status   = 429,
            Detail   = $"Rate limit exceeded. Retry after {retryAfter} seconds.",
            Instance = ctx.HttpContext.Request.Path,
        }, token);
    };
});

app.UseRateLimiter();   // before UseAuthentication

// ── Apply named policies to specific endpoints ────────────────────────────
[HttpPost("login")]
[EnableRateLimiting("AuthEndpoints")]   // 5 attempts per 15 min
public async Task<IActionResult> Login([...]) { ... }

[HttpGet("search")]
[EnableRateLimiting("SearchEndpoints")]  // token bucket
public async Task<IActionResult> Search([...]) { ... }

// ── Disable rate limiting for internal health checks ──────────────────────
app.MapHealthChecks("/health").DisableRateLimiting();
Note: The four rate limiting algorithms suit different scenarios. Fixed window: simple count per time window — can allow burst at window boundary (99 requests at end of window + 99 at start = 198 in 1 second). Sliding window: smoother, counts requests in a rolling window — prevents boundary bursts. Token bucket: allows controlled bursts up to the bucket capacity then averages to the refill rate — ideal for bursty but bounded traffic. Concurrency limiter: limits simultaneous in-flight requests rather than requests-per-time — ideal for protecting expensive operations (report generation, image processing).
Tip: Partition rate limits by authenticated user ID for fairness — one misbehaving user does not consume rate limit quota for others. Fall back to IP address for anonymous requests. For multi-tenant APIs, partition by tenant ID to give each tenant their own quota. Use PartitionedRateLimiter.Create<HttpContext, string> for custom partition keys that combine user and endpoint: $"user:{userId}:endpoint:search" gives each user their own per-endpoint limit.
Warning: Rate limiting in a multi-instance deployment is per-server by default. With 3 API instances, each allows 100 requests per minute — effectively 300 total. Use a distributed rate limiter backed by Redis for global limits across all instances. Implement IRateLimiterPolicy<TKey> with Redis atomic increment operations (INCR + EXPIRE) for a distributed counter. The built-in middleware has no distributed backend — custom implementation or a library like AspNetCoreRateLimit (with Redis) is required for truly global limits.

Rate Limiter Algorithms Comparison

Algorithm Burst Allowed Smooth Rate Best For
Fixed Window At boundary No Simple API quotas
Sliding Window No Yes Fair usage limits
Token Bucket Yes (bucket size) Yes (refill rate) Bursty clients
Concurrency N/A N/A (concurrent) Expensive operations

Common Mistakes

Mistake 1 — Partitioning by IP alone (all users behind NAT share one limit)

❌ Wrong — corporate network’s 500 users share one IP; one user’s requests consume the entire quota.

✅ Correct — partition by authenticated user ID when available; fall back to IP only for anonymous requests.

Mistake 2 — Not returning Retry-After header (Angular cannot implement backoff)

❌ Wrong — 429 response with no Retry-After; Angular retries immediately; amplifies the load spike.

✅ Correct — always include Retry-After in the 429 response; Angular uses it to schedule the retry.

🧠 Test Yourself

A fixed window limiter allows 100 requests per minute. A client sends 99 requests at t=0:59 and 99 requests at t=1:01. How many are allowed?