API Rate Limiting

Overview

Rate limiting controls the number of requests a client can make within a time window, protecting APIs from abuse, ensuring fair usage, and preventing backend overload. This skill should be invoked when protecting APIs from abuse, ensuring fair usage among clients, or preventing backend services from being overwhelmed.

Core Principles

Algorithm Selection: Choose appropriate algorithm (token bucket, sliding window, fixed window)
Granularity: Rate limit by user, IP, API key, or endpoint
Graceful Degradation: Return proper HTTP status codes (429) with retry information
Transparency: Include rate limit headers in responses

Preparation Checklist

Identify rate limit dimensions (per user, per IP, per endpoint)
Choose storage backend (Redis for distributed, in-memory for single instance)
Define rate limit parameters (requests per minute/hour)
Plan response format for throttled requests

rate-limiting

API Rate Limiting

Overview

Core Principles

Preparation Checklist