packages/kilo-docs/pages/gateway/usage-and-billing.md
The Kilo AI Gateway tracks usage and costs for every request with microdollar precision (1 USD = 1,000,000 microdollars). This enables accurate billing even for very low-cost requests.
Every request to the gateway follows this flow:
Costs are determined by the upstream provider's pricing based on token usage:
:free have zero cost -- usage is tracked but not billedYour account balance is the difference between total credits purchased and total usage. Check your balance in the Kilo dashboard.
When your balance reaches zero, requests to paid models will return HTTP 402 with a link to add credits:
{
"error": {
"message": "Insufficient balance. Please add credits to continue.",
"code": 402,
"metadata": {
"buyCreditsUrl": "https://app.kilo.ai/credits"
}
}
}
Organizations have their own balance pool that members draw from. Organization billing supports:
Organizations can enforce policies on gateway usage for their members.
Restrict which models organization members can use:
# Examples of allow list entries
anthropic/claude-sonnet-4.5 # Specific model
anthropic/* # All Anthropic models
openai/gpt-5.2 # Specific OpenAI model
The allow list supports exact matches and wildcard patterns. Requests for models not on the list return HTTP 403.
Restrict which inference providers can be used for routing. This is passed to the upstream router and affects which backends serve the request.
Organizations can set a data collection policy (allow or deny) that is applied to all requests from their members. Some free models require data collection to be allowed.
Set a maximum daily spend per organization member. When a member reaches their daily limit, subsequent requests return a balance error. The daily limit resets at midnight UTC.
All free model requests (both anonymous and authenticated) are rate-limited by IP address:
| Scope | Limit |
|---|---|
| Free models per IP | 200 requests per hour |
When rate-limited, you receive HTTP 429:
{
"error": {
"message": "Rate limit exceeded for free models. Please try again later.",
"code": 429
}
}
Paid model requests are not rate-limited by the gateway itself, but may be rate-limited by upstream providers. Organization per-user daily spending limits provide an additional layer of cost control.
Usage data is tracked per request and includes:
| Field | Description |
|---|---|
model | Model ID used |
provider | Inference provider that served the request |
input_tokens | Number of input/prompt tokens |
output_tokens | Number of output/completion tokens |
cache_write_tokens | Tokens written to cache |
cache_hit_tokens | Tokens served from cache |
cost_microdollars | Cost in microdollars (1 USD = 1,000,000) |
time_to_first_token | Latency to first token (streaming only) |
is_byok | Whether a BYOK key was used |
Token counts are provided by the upstream model and are based on the model's native tokenizer. The gateway does not re-tokenize content. Usage data is available:
usage field of the response body[DONE]