Rate Limits and Costs

Understanding and managing API usage is crucial for a smooth and cost-effective experience with Kilo Code. This section explains how to track your token usage, costs, and how to configure rate limits.

Token Usage

Kilo Code interacts with AI models using tokens. Tokens are essentially pieces of words. The number of tokens used in a request and response affects both the processing time and the cost.

Input Tokens: These are the tokens in your prompt, including the system prompt, your instructions, and any context provided (e.g., file contents).
Output Tokens: These are the tokens generated by the AI model in its response.

You can see the number of input and output tokens used for each interaction in the chat history.

Cost Calculation

Most AI providers charge based on the number of tokens used. Pricing varies depending on the provider and the specific model.

Kilo Code automatically calculates the estimated cost of each API request based on the configured model's pricing. This cost is displayed in the chat history, next to the token usage.

Note:

The cost calculation is an estimate. The actual cost may vary slightly depending on the provider's billing practices.
Some providers may offer free tiers or credits. Check your provider's documentation for details.
Some providers offer prompt caching which greatly lowers cost.

Configuring Rate Limits

To prevent accidental overuse of the API and to help you manage costs, Kilo Code allows you to set a rate limit. The rate limit specifies the minimum time (in seconds) between API requests.

How to configure:

Open the Kilo Code settings ({% codicon name="gear" /%} icon in the top right corner).
Go to the "Advanced Settings" section.
Find the "Rate Limit (seconds)" setting.
Enter the desired delay in seconds. A value of 0 disables rate limiting.

Example:

If you set the rate limit to 10 seconds, Kilo Code will wait at least 10 seconds after one API request completes before sending the next one.

Tips for Optimizing Token Usage

Be Concise: Use clear and concise language in your prompts. Avoid unnecessary words or details.
Provide Only Relevant Context: Use context mentions (@file.ts, @folder/) selectively. Only include the files that are directly relevant to the task.
Break Down Tasks: Divide large tasks into smaller, more focused sub-tasks.
Use Custom Instructions: Provide custom instructions to guide Kilo Code's behavior and reduce the need for lengthy explanations in each prompt.
Choose the Right Model: Some models are more cost-effective than others. Consider using a smaller, faster model for tasks that don't require the full power of a larger model.
Use Modes: Different modes can access different tools, for example Architect can't modify code, which makes it a safe choice when analyzing a complex codebase, without worrying about accidentally allowing expensive operations.
Disable MCP If Not Used: If you're not using MCP (Model Context Protocol) features, consider disabling it in Settings > Agent Behaviour > MCP Servers to significantly reduce the size of the system prompt and save tokens.

By understanding and managing your API usage, you can use Kilo Code effectively and efficiently.