Rate Limits & Quotas
Guides
Rate Limit Headers
Every API response includes rate limit information in the headers:
| Header | Description |
|---|---|
x-ratelimit-limit-requests | Maximum requests per minute (RPM) |
x-ratelimit-remaining-requests | Remaining requests this minute |
x-ratelimit-limit-tokens | Maximum tokens per minute (TPM) |
x-ratelimit-remaining-tokens | Remaining tokens this minute |
x-ratelimit-reset-requests | Seconds until RPM counter resets |
Plan Limits
| Plan | RPM | TPM | Models |
|---|---|---|---|
| Starter | 60 | 100K | 50+ models |
| Pro | 600 | 1M | 200+ models |
| Enterprise | Custom | Custom | 400+ models |
Best Practices
- Monitor headers: Track
x-ratelimit-remaining-*to avoid hitting limits - Use exponential backoff: When you receive a 429, wait and retry
- Batch requests: Combine multiple queries where possible
- Cache responses: Cache common queries to reduce API calls
- Upgrade proactively: Monitor usage trends and upgrade before hitting limits