docs/features/auto-compact.mdx
When your conversation approaches the model's context window limit, Cline automatically summarizes it to free up space and keep working.
<Frame> </Frame>Cline monitors token usage during your conversation. When you're getting close to the limit, it:
You'll see a summarization tool call when this happens, showing the cost like any other API call.
Previously, Cline would truncate older messages when hitting context limits, losing important context.
Now with summarization:
Summarization leverages your existing prompt cache from the conversation, so it costs about the same as any other tool call.
Since most input tokens are already cached, you're primarily paying for summary generation (output tokens), making it cost-effective.
Auto Compact uses advanced LLM-based summarization for these models:
You can use checkpoints to restore your task state from before a summarization occurred. You never truly lose context since you can always roll back.
Editing a message before a summarization tool call works similarly, restoring the conversation to that point.