packages/cloud-frontend/content/rate-limits.mdx
import { Callout, Cards } from "@/docs/components";
elizaOS Cloud applies rate limits to ensure fair usage and platform stability.
Rate limits are applied per API key and vary by:
Rate limits are applied per API key. All users share the same default limits — credits and billing control cost.
| Endpoint | Rate Limit | Preset |
|---|---|---|
| Chat Completions | 200/min | RELAXED |
| Responses | 200/min | RELAXED |
| Embeddings | 60/min | STANDARD |
| Image Generation | 60/min | STANDARD |
| Video Generation | 5/5min | CRITICAL |
| Knowledge Query | 60/min | STANDARD |
Custom rate limits based on your needs. Contact sales.
Rate-limited responses include rate limit information. Some endpoints may also include these headers on successful responses:
X-RateLimit-Limit: 200
X-RateLimit-Remaining: 185
X-RateLimit-Reset: 2026-01-15T12:01:00.000Z
X-RateLimit-Policy: redis
| Header | Description |
|---|---|
X-RateLimit-Limit | Max requests in window |
X-RateLimit-Remaining | Requests remaining |
X-RateLimit-Reset | ISO timestamp when window resets |
X-RateLimit-Policy | Backend: redis or in-memory |
When rate limited, you'll receive:
{
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests",
"retryAfter": 42
}
}
Implement exponential backoff:
async function fetchWithRetry(url, options, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
const response = await fetch(url, options);
if (response.status === 429) {
const retryAfter = response.headers.get("Retry-After");
const waitTime = retryAfter
? parseInt(retryAfter) * 1000
: Math.pow(2, i) * 1000;
console.log(`Rate limited. Waiting ${waitTime}ms...`);
await new Promise((resolve) => setTimeout(resolve, waitTime));
continue;
}
return response;
}
throw new Error("Max retries exceeded");
}
For high-volume applications, implement a request queue:
class RequestQueue {
constructor(rateLimit = 60, windowMs = 60000) {
this.queue = [];
this.rateLimit = rateLimit;
this.windowMs = windowMs;
this.requestTimes = [];
}
async add(request) {
return new Promise((resolve, reject) => {
this.queue.push({ request, resolve, reject });
this.process();
});
}
async process() {
if (this.queue.length === 0) return;
// Clean old request times
const now = Date.now();
this.requestTimes = this.requestTimes.filter(
(t) => now - t < this.windowMs,
);
// Check if we can make a request
if (this.requestTimes.length < this.rateLimit) {
const { request, resolve, reject } = this.queue.shift();
this.requestTimes.push(now);
try {
const result = await request();
resolve(result);
} catch (error) {
reject(error);
}
this.process();
} else {
// Wait until we can make another request
const waitTime = this.windowMs - (now - this.requestTimes[0]);
setTimeout(() => this.process(), waitTime);
}
}
}
For specific use cases, contact support to request a custom limit increase.
The BURST preset (10 req/sec) is available for real-time features. Standard AI endpoints use per-minute windows only.
Track your API usage:
curl -X GET "https://elizacloud.ai/api/quotas/usage" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"period": "current_minute",
"usage": {
"chat": { "used": 45, "limit": 200 },
"embeddings": { "used": 12, "limit": 60 }
}
}