API Rate Limiting Best Practices: How Top APIs Handle Throttling

Rate limiting is one of the most consequential design decisions in any API. Get it wrong and you either crush your infrastructure or alienate your developers. This guide examines how four of the most widely used APIs — Stripe, GitHub, Twitter/X, and OpenAI — implement rate limiting, and distills the best practices every API team should follow in 2026.

Why Rate Limiting Matters More Than Ever

In 2026, APIs face unprecedented traffic patterns. AI agents make thousands of sequential API calls per workflow. Webhook-driven architectures create bursty traffic. Multi-tenant SaaS platforms proxy API calls on behalf of millions of end users. Without well-designed rate limits, a single runaway client can degrade service for everyone.

Good rate limiting protects infrastructure, ensures fair resource allocation across tenants, prevents abuse, and gives developers clear expectations about what they can build.

How Stripe Does Rate Limiting

Stripe uses a straightforward per-key rate limit with separate limits for different operation types:

OperationRate LimitWindow
API requests (live mode)100 requests/secondPer-second
API requests (test mode)25 requests/secondPer-second
File uploads20 requests/secondPer-second

What Stripe Gets Right

Stripe's philosophy: Make rate limits generous enough that well-behaved clients rarely hit them, but strict enough to prevent runaway scripts from impacting the platform. Most Stripe integrations never see a 429.

How GitHub Does Rate Limiting

GitHub uses a tiered rate limiting system that varies by authentication method and API version:

Auth MethodREST API LimitGraphQL LimitWindow
Unauthenticated60 requests/hourN/APer hour
Personal Access Token5,000 requests/hour5,000 points/hourPer hour
GitHub App (installation)5,000-15,000/hour5,000-15,000 pts/hrPer hour, scales with repos

What GitHub Gets Right

How Twitter/X Does Rate Limiting

Twitter/X has the most complex and frequently changing rate limiting system among major APIs. As of 2026:

PlanMonthly CostRead LimitWrite Limit
Free$0~1 read/15 min1,500 posts/month
Basic$200/mo10,000 reads/month3,000 posts/month
Pro$5,000/mo1,000,000 reads/month300,000 posts/month
EnterpriseCustomCustomCustom

What Twitter/X Gets Wrong

Lesson from Twitter/X: Unclear rate limits erode developer trust. If developers cannot predict when they will be throttled, they build less reliable integrations or leave the platform entirely.

How OpenAI Does Rate Limiting

OpenAI uses a dual-dimension rate limiting system based on both requests per minute (RPM) and tokens per minute (TPM):

TierRequirementGPT-4.1 RPMGPT-4.1 TPM
FreeAccount creation3 RPM200 TPM
Tier 1$5 spent500 RPM30,000 TPM
Tier 2$50 spent + 7 days5,000 RPM450,000 TPM
Tier 3$100 spent + 7 days5,000 RPM800,000 TPM
Tier 4$250 spent + 14 days10,000 RPM2,000,000 TPM
Tier 5$1,000 spent + 30 days10,000 RPM10,000,000 TPM

What OpenAI Gets Right

Best Practices for Implementing Rate Limits

1. Use Token Bucket or Sliding Window Algorithms

Fixed window rate limiting (e.g., "100 requests per minute") suffers from boundary bursting — a client can make 100 requests at 11:59:59 and 100 more at 12:00:01. Use sliding window or token bucket algorithms to prevent this:

# Token bucket pseudocode
class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate          # tokens added per second
        self.capacity = capacity  # max tokens
        self.tokens = capacity
        self.last_refill = time.now()

    def allow_request(self, cost=1):
        self.refill()
        if self.tokens >= cost:
            self.tokens -= cost
            return True
        return False

    def refill(self):
        now = time.now()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity,
                         self.tokens + elapsed * self.rate)
        self.last_refill = now

2. Always Return Standard Rate Limit Headers

Follow the IETF RateLimit header fields draft standard. Include these headers in every response:

RateLimit-Limit: 100
RateLimit-Remaining: 67
RateLimit-Reset: 30
Retry-After: 5          # Only on 429 responses

3. Differentiate Limits by Resource Cost

Not all requests are equal. A list endpoint that returns 1,000 records is more expensive than a GET-by-ID. Assign different rate limit costs based on the computational expense of each endpoint, similar to GitHub's GraphQL point system.

4. Implement Retry-After with Exponential Backoff

When returning a 429, always include a Retry-After header. On the client side, implement exponential backoff with jitter:

async function fetchWithRetry(url, options, maxRetries = 5) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, options);
    if (response.status !== 429) return response;

    const retryAfter = response.headers.get('Retry-After');
    const baseDelay = retryAfter ? parseInt(retryAfter) * 1000 : 1000;
    const jitter = Math.random() * 1000;
    const delay = Math.min(baseDelay * Math.pow(2, i) + jitter, 60000);

    await new Promise(r => setTimeout(r, delay));
  }
  throw new Error('Max retries exceeded');
}

5. Provide Rate Limit Dashboards

Give developers real-time visibility into their rate limit consumption. Stripe and GitHub both offer API usage dashboards showing historical consumption patterns. This helps developers optimize their usage proactively rather than discovering limits reactively via 429 errors.

6. Support Idempotency for Safe Retries

Rate limiting inevitably causes retries. If your API mutates state (creates payments, sends messages), support idempotency keys so retried requests do not create duplicate side effects. This is arguably more important than the rate limiting itself.

7. Separate Read and Write Limits

Read operations are typically cheaper to serve than writes. Maintain separate rate limit buckets for reads (GET) and writes (POST/PUT/DELETE) to maximize throughput for the common case while protecting write-heavy resources.

Rate Limiting Comparison Summary

FeatureStripeGitHubTwitter/XOpenAI
AlgorithmPer-secondPer-hour + burstMonthly capsPer-minute dual
Standard headersYesYesInconsistentYes
Retry-AfterYesYesSometimesYes
Cost-based limitingNoYes (GraphQL)NoYes (tokens)
Auto-scaling limitsOn requestYes (Apps)NoYes (tiers)
Documentation qualityExcellentExcellentPoorGood
The golden rule: Rate limits should be generous enough that well-behaved clients rarely encounter them, documented clearly enough that developers can plan around them, and enforced consistently enough that the platform stays stable.

The APIs that get rate limiting right — Stripe and GitHub — treat it as a first-class developer experience concern, not just an infrastructure protection mechanism. As API traffic patterns grow more complex with AI agents and multi-service orchestration, investing in well-designed rate limiting is more important than ever.

Further Reading

Recommended books: