March 9, 2026

Best PracticesRate LimitingAPI Design

API Rate Limiting Best Practices: How Top APIs Handle Throttling

Rate limiting is one of the most consequential design decisions in any API. Get it wrong and you either crush your infrastructure or alienate your developers. This guide examines how four of the most widely used APIs — Stripe, GitHub, Twitter/X, and OpenAI — implement rate limiting, and distills the best practices every API team should follow in 2026.

Why Rate Limiting Matters More Than Ever

In 2026, APIs face unprecedented traffic patterns. AI agents make thousands of sequential API calls per workflow. Webhook-driven architectures create bursty traffic. Multi-tenant SaaS platforms proxy API calls on behalf of millions of end users. Without well-designed rate limits, a single runaway client can degrade service for everyone.

Good rate limiting protects infrastructure, ensures fair resource allocation across tenants, prevents abuse, and gives developers clear expectations about what they can build.

How Stripe Does Rate Limiting

Stripe uses a straightforward per-key rate limit with separate limits for different operation types:

Operation	Rate Limit	Window
API requests (live mode)	100 requests/second	Per-second
API requests (test mode)	25 requests/second	Per-second
File uploads	20 requests/second	Per-second

What Stripe Gets Right

Clear response headers: Every response includes RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset headers conforming to the IETF draft standard.
Graceful 429 responses: When rate limited, Stripe returns a 429 Too Many Requests response with a Retry-After header telling clients exactly how long to wait.
Separate test/live limits: Test mode has lower limits so development traffic never interferes with production capacity planning.
Idempotency keys: Stripe encourages clients to include Idempotency-Key headers, allowing safe retries after rate-limit-induced failures without duplicate charges.

Stripe's philosophy: Make rate limits generous enough that well-behaved clients rarely hit them, but strict enough to prevent runaway scripts from impacting the platform. Most Stripe integrations never see a 429.

How GitHub Does Rate Limiting

GitHub uses a tiered rate limiting system that varies by authentication method and API version:

Auth Method	REST API Limit	GraphQL Limit	Window
Unauthenticated	60 requests/hour	N/A	Per hour
Personal Access Token	5,000 requests/hour	5,000 points/hour	Per hour
GitHub App (installation)	5,000-15,000/hour	5,000-15,000 pts/hr	Per hour, scales with repos

What GitHub Gets Right

Point-based GraphQL limits: Instead of counting requests, GitHub assigns point costs based on query complexity. A simple query costs 1 point; a query requesting 100 items from a connection costs more. This prevents a single expensive query from consuming the same budget as a trivial one.
Secondary rate limits: Beyond the primary hourly limits, GitHub enforces per-minute and per-second concurrency caps to prevent burst abuse. This is documented separately from the primary limits.
Conditional requests: GitHub strongly encourages If-None-Match / If-Modified-Since headers. Conditional requests that return 304 Not Modified do not count against the rate limit, rewarding well-designed clients.
Scaling with usage: GitHub App installation tokens get higher limits as the number of repositories they manage increases, aligning limits with legitimate usage patterns.

How Twitter/X Does Rate Limiting

Twitter/X has the most complex and frequently changing rate limiting system among major APIs. As of 2026:

Plan	Monthly Cost	Read Limit	Write Limit
Free	$0	~1 read/15 min	1,500 posts/month
Basic	$200/mo	10,000 reads/month	3,000 posts/month
Pro	$5,000/mo	1,000,000 reads/month	300,000 posts/month
Enterprise	Custom	Custom	Custom

What Twitter/X Gets Wrong

Opaque limits: Exact rate limits per endpoint are poorly documented and change without notice. Developers frequently discover limits through trial and error.
Monthly caps instead of per-second/minute: Monthly aggregate limits make it impossible to plan burst capacity. You might burn your entire monthly allocation in one hour and have nothing left for 29 days.
Aggressive pricing: The gap between Free (~1 read per 15 minutes) and Basic ($200/month) creates a dead zone for hobby developers and small projects.
No standard headers: Twitter/X does not consistently return IETF-standard rate limit headers, making client-side rate limit tracking unreliable.

Lesson from Twitter/X: Unclear rate limits erode developer trust. If developers cannot predict when they will be throttled, they build less reliable integrations or leave the platform entirely.

How OpenAI Does Rate Limiting

OpenAI uses a dual-dimension rate limiting system based on both requests per minute (RPM) and tokens per minute (TPM):

Tier	Requirement	GPT-4.1 RPM	GPT-4.1 TPM
Free	Account creation	3 RPM	200 TPM
Tier 1	$5 spent	500 RPM	30,000 TPM
Tier 2	$50 spent + 7 days	5,000 RPM	450,000 TPM
Tier 3	$100 spent + 7 days	5,000 RPM	800,000 TPM
Tier 4	$250 spent + 14 days	10,000 RPM	2,000,000 TPM
Tier 5	$1,000 spent + 30 days	10,000 RPM	10,000,000 TPM

What OpenAI Gets Right

Dual-dimension limiting: Counting both requests and tokens prevents abuse that simple request counting misses. A single request with a 100K-token prompt consumes more resources than 100 small requests.
Progressive tier system: Limits increase automatically as you spend more and build history. No manual upgrade process needed.
Per-model limits: Different models have different limits, reflecting their different computational costs. Cheaper models get higher limits.
Clear documentation: OpenAI publishes exact limits per tier per model in a public table that is updated when changes occur.

Best Practices for Implementing Rate Limits

1. Use Token Bucket or Sliding Window Algorithms

Fixed window rate limiting (e.g., "100 requests per minute") suffers from boundary bursting — a client can make 100 requests at 11:59:59 and 100 more at 12:00:01. Use sliding window or token bucket algorithms to prevent this:

# Token bucket pseudocode
class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate          # tokens added per second
        self.capacity = capacity  # max tokens
        self.tokens = capacity
        self.last_refill = time.now()

    def allow_request(self, cost=1):
        self.refill()
        if self.tokens >= cost:
            self.tokens -= cost
            return True
        return False

    def refill(self):
        now = time.now()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity,
                         self.tokens + elapsed * self.rate)
        self.last_refill = now

2. Always Return Standard Rate Limit Headers

Follow the IETF RateLimit header fields draft standard. Include these headers in every response:

RateLimit-Limit: 100
RateLimit-Remaining: 67
RateLimit-Reset: 30
Retry-After: 5          # Only on 429 responses

3. Differentiate Limits by Resource Cost

Not all requests are equal. A list endpoint that returns 1,000 records is more expensive than a GET-by-ID. Assign different rate limit costs based on the computational expense of each endpoint, similar to GitHub's GraphQL point system.

4. Implement Retry-After with Exponential Backoff

When returning a 429, always include a Retry-After header. On the client side, implement exponential backoff with jitter:

async function fetchWithRetry(url, options, maxRetries = 5) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, options);
    if (response.status !== 429) return response;

    const retryAfter = response.headers.get('Retry-After');
    const baseDelay = retryAfter ? parseInt(retryAfter) * 1000 : 1000;
    const jitter = Math.random() * 1000;
    const delay = Math.min(baseDelay * Math.pow(2, i) + jitter, 60000);

    await new Promise(r => setTimeout(r, delay));
  }
  throw new Error('Max retries exceeded');
}

5. Provide Rate Limit Dashboards

Give developers real-time visibility into their rate limit consumption. Stripe and GitHub both offer API usage dashboards showing historical consumption patterns. This helps developers optimize their usage proactively rather than discovering limits reactively via 429 errors.

6. Support Idempotency for Safe Retries

Rate limiting inevitably causes retries. If your API mutates state (creates payments, sends messages), support idempotency keys so retried requests do not create duplicate side effects. This is arguably more important than the rate limiting itself.

7. Separate Read and Write Limits

Read operations are typically cheaper to serve than writes. Maintain separate rate limit buckets for reads (GET) and writes (POST/PUT/DELETE) to maximize throughput for the common case while protecting write-heavy resources.

Rate Limiting Comparison Summary

Feature	Stripe	GitHub	Twitter/X	OpenAI
Algorithm	Per-second	Per-hour + burst	Monthly caps	Per-minute dual
Standard headers	Yes	Yes	Inconsistent	Yes
Retry-After	Yes	Yes	Sometimes	Yes
Cost-based limiting	No	Yes (GraphQL)	No	Yes (tokens)
Auto-scaling limits	On request	Yes (Apps)	No	Yes (tiers)
Documentation quality	Excellent	Excellent	Poor	Good

The golden rule: Rate limits should be generous enough that well-behaved clients rarely encounter them, documented clearly enough that developers can plan around them, and enforced consistently enough that the platform stays stable.

The APIs that get rate limiting right — Stripe and GitHub — treat it as a first-class developer experience concern, not just an infrastructure protection mechanism. As API traffic patterns grow more complex with AI agents and multi-service orchestration, investing in well-designed rate limiting is more important than ever.

API Rate Limiting Best Practices: How Top APIs Handle Throttling

Why Rate Limiting Matters More Than Ever

How Stripe Does Rate Limiting

What Stripe Gets Right

How GitHub Does Rate Limiting

What GitHub Gets Right

How Twitter/X Does Rate Limiting

What Twitter/X Gets Wrong

How OpenAI Does Rate Limiting

What OpenAI Gets Right

Best Practices for Implementing Rate Limits

1. Use Token Bucket or Sliding Window Algorithms

2. Always Return Standard Rate Limit Headers

3. Differentiate Limits by Resource Cost

4. Implement Retry-After with Exponential Backoff

5. Provide Rate Limit Dashboards

6. Support Idempotency for Safe Retries

7. Separate Read and Write Limits

Rate Limiting Comparison Summary

Further Reading