API Rate Limiting Best Practices: How Top APIs Handle Throttling
Rate limiting is one of the most consequential design decisions in any API. Get it wrong and you either crush your infrastructure or alienate your developers. This guide examines how four of the most widely used APIs — Stripe, GitHub, Twitter/X, and OpenAI — implement rate limiting, and distills the best practices every API team should follow in 2026.
Why Rate Limiting Matters More Than Ever
In 2026, APIs face unprecedented traffic patterns. AI agents make thousands of sequential API calls per workflow. Webhook-driven architectures create bursty traffic. Multi-tenant SaaS platforms proxy API calls on behalf of millions of end users. Without well-designed rate limits, a single runaway client can degrade service for everyone.
Good rate limiting protects infrastructure, ensures fair resource allocation across tenants, prevents abuse, and gives developers clear expectations about what they can build.
How Stripe Does Rate Limiting
Stripe uses a straightforward per-key rate limit with separate limits for different operation types:
| Operation | Rate Limit | Window |
|---|---|---|
| API requests (live mode) | 100 requests/second | Per-second |
| API requests (test mode) | 25 requests/second | Per-second |
| File uploads | 20 requests/second | Per-second |
What Stripe Gets Right
- Clear response headers: Every response includes
RateLimit-Limit,RateLimit-Remaining, andRateLimit-Resetheaders conforming to the IETF draft standard. - Graceful 429 responses: When rate limited, Stripe returns a
429 Too Many Requestsresponse with aRetry-Afterheader telling clients exactly how long to wait. - Separate test/live limits: Test mode has lower limits so development traffic never interferes with production capacity planning.
- Idempotency keys: Stripe encourages clients to include
Idempotency-Keyheaders, allowing safe retries after rate-limit-induced failures without duplicate charges.
How GitHub Does Rate Limiting
GitHub uses a tiered rate limiting system that varies by authentication method and API version:
| Auth Method | REST API Limit | GraphQL Limit | Window |
|---|---|---|---|
| Unauthenticated | 60 requests/hour | N/A | Per hour |
| Personal Access Token | 5,000 requests/hour | 5,000 points/hour | Per hour |
| GitHub App (installation) | 5,000-15,000/hour | 5,000-15,000 pts/hr | Per hour, scales with repos |
What GitHub Gets Right
- Point-based GraphQL limits: Instead of counting requests, GitHub assigns point costs based on query complexity. A simple query costs 1 point; a query requesting 100 items from a connection costs more. This prevents a single expensive query from consuming the same budget as a trivial one.
- Secondary rate limits: Beyond the primary hourly limits, GitHub enforces per-minute and per-second concurrency caps to prevent burst abuse. This is documented separately from the primary limits.
- Conditional requests: GitHub strongly encourages
If-None-Match/If-Modified-Sinceheaders. Conditional requests that return304 Not Modifieddo not count against the rate limit, rewarding well-designed clients. - Scaling with usage: GitHub App installation tokens get higher limits as the number of repositories they manage increases, aligning limits with legitimate usage patterns.
How Twitter/X Does Rate Limiting
Twitter/X has the most complex and frequently changing rate limiting system among major APIs. As of 2026:
| Plan | Monthly Cost | Read Limit | Write Limit |
|---|---|---|---|
| Free | $0 | ~1 read/15 min | 1,500 posts/month |
| Basic | $200/mo | 10,000 reads/month | 3,000 posts/month |
| Pro | $5,000/mo | 1,000,000 reads/month | 300,000 posts/month |
| Enterprise | Custom | Custom | Custom |
What Twitter/X Gets Wrong
- Opaque limits: Exact rate limits per endpoint are poorly documented and change without notice. Developers frequently discover limits through trial and error.
- Monthly caps instead of per-second/minute: Monthly aggregate limits make it impossible to plan burst capacity. You might burn your entire monthly allocation in one hour and have nothing left for 29 days.
- Aggressive pricing: The gap between Free (~1 read per 15 minutes) and Basic ($200/month) creates a dead zone for hobby developers and small projects.
- No standard headers: Twitter/X does not consistently return IETF-standard rate limit headers, making client-side rate limit tracking unreliable.
How OpenAI Does Rate Limiting
OpenAI uses a dual-dimension rate limiting system based on both requests per minute (RPM) and tokens per minute (TPM):
| Tier | Requirement | GPT-4.1 RPM | GPT-4.1 TPM |
|---|---|---|---|
| Free | Account creation | 3 RPM | 200 TPM |
| Tier 1 | $5 spent | 500 RPM | 30,000 TPM |
| Tier 2 | $50 spent + 7 days | 5,000 RPM | 450,000 TPM |
| Tier 3 | $100 spent + 7 days | 5,000 RPM | 800,000 TPM |
| Tier 4 | $250 spent + 14 days | 10,000 RPM | 2,000,000 TPM |
| Tier 5 | $1,000 spent + 30 days | 10,000 RPM | 10,000,000 TPM |
What OpenAI Gets Right
- Dual-dimension limiting: Counting both requests and tokens prevents abuse that simple request counting misses. A single request with a 100K-token prompt consumes more resources than 100 small requests.
- Progressive tier system: Limits increase automatically as you spend more and build history. No manual upgrade process needed.
- Per-model limits: Different models have different limits, reflecting their different computational costs. Cheaper models get higher limits.
- Clear documentation: OpenAI publishes exact limits per tier per model in a public table that is updated when changes occur.
Best Practices for Implementing Rate Limits
1. Use Token Bucket or Sliding Window Algorithms
Fixed window rate limiting (e.g., "100 requests per minute") suffers from boundary bursting — a client can make 100 requests at 11:59:59 and 100 more at 12:00:01. Use sliding window or token bucket algorithms to prevent this:
# Token bucket pseudocode
class TokenBucket:
def __init__(self, rate, capacity):
self.rate = rate # tokens added per second
self.capacity = capacity # max tokens
self.tokens = capacity
self.last_refill = time.now()
def allow_request(self, cost=1):
self.refill()
if self.tokens >= cost:
self.tokens -= cost
return True
return False
def refill(self):
now = time.now()
elapsed = now - self.last_refill
self.tokens = min(self.capacity,
self.tokens + elapsed * self.rate)
self.last_refill = now
2. Always Return Standard Rate Limit Headers
Follow the IETF RateLimit header fields draft standard. Include these headers in every response:
RateLimit-Limit: 100
RateLimit-Remaining: 67
RateLimit-Reset: 30
Retry-After: 5 # Only on 429 responses
3. Differentiate Limits by Resource Cost
Not all requests are equal. A list endpoint that returns 1,000 records is more expensive than a GET-by-ID. Assign different rate limit costs based on the computational expense of each endpoint, similar to GitHub's GraphQL point system.
4. Implement Retry-After with Exponential Backoff
When returning a 429, always include a Retry-After header. On the client side, implement exponential backoff with jitter:
async function fetchWithRetry(url, options, maxRetries = 5) {
for (let i = 0; i < maxRetries; i++) {
const response = await fetch(url, options);
if (response.status !== 429) return response;
const retryAfter = response.headers.get('Retry-After');
const baseDelay = retryAfter ? parseInt(retryAfter) * 1000 : 1000;
const jitter = Math.random() * 1000;
const delay = Math.min(baseDelay * Math.pow(2, i) + jitter, 60000);
await new Promise(r => setTimeout(r, delay));
}
throw new Error('Max retries exceeded');
}
5. Provide Rate Limit Dashboards
Give developers real-time visibility into their rate limit consumption. Stripe and GitHub both offer API usage dashboards showing historical consumption patterns. This helps developers optimize their usage proactively rather than discovering limits reactively via 429 errors.
6. Support Idempotency for Safe Retries
Rate limiting inevitably causes retries. If your API mutates state (creates payments, sends messages), support idempotency keys so retried requests do not create duplicate side effects. This is arguably more important than the rate limiting itself.
7. Separate Read and Write Limits
Read operations are typically cheaper to serve than writes. Maintain separate rate limit buckets for reads (GET) and writes (POST/PUT/DELETE) to maximize throughput for the common case while protecting write-heavy resources.
Rate Limiting Comparison Summary
| Feature | Stripe | GitHub | Twitter/X | OpenAI |
|---|---|---|---|---|
| Algorithm | Per-second | Per-hour + burst | Monthly caps | Per-minute dual |
| Standard headers | Yes | Yes | Inconsistent | Yes |
| Retry-After | Yes | Yes | Sometimes | Yes |
| Cost-based limiting | No | Yes (GraphQL) | No | Yes (tokens) |
| Auto-scaling limits | On request | Yes (Apps) | No | Yes (tiers) |
| Documentation quality | Excellent | Excellent | Poor | Good |
The APIs that get rate limiting right — Stripe and GitHub — treat it as a first-class developer experience concern, not just an infrastructure protection mechanism. As API traffic patterns grow more complex with AI agents and multi-service orchestration, investing in well-designed rate limiting is more important than ever.
Further Reading
- Designing APIs with Swagger and OpenAPI — Covers rate limit documentation, API design patterns, and OpenAPI specification best practices.
- Building Microservices (O'Reilly) — Essential reading on distributed rate limiting, circuit breakers, and back-pressure patterns.