Auto Compact - Cline

When your conversation approaches the model's context window limit, Cline automatically summarizes it to free up space and keep working.

How It Works

Cline monitors token usage during your conversation. When you're getting close to the limit, it:

Creates a comprehensive summary of everything that's happened
Preserves all technical details, code changes, and decisions
Replaces the conversation history with the summary
Continues exactly where it left off

You'll see a summarization tool call when this happens, showing the cost like any other API call.

Why This Matters

Previously, Cline would truncate older messages when hitting context limits, losing important context.

Now with summarization:

All technical decisions and code patterns are preserved
File changes and project context remain intact
Cline remembers everything it's done
You can work on much larger projects without interruption

<Tip> Auto Compact works beautifully with [Focus Chain](/features/focus-chain). When Focus Chain is enabled, todo lists persist across summarizations. Cline can work on long-horizon tasks spanning multiple context windows while staying on track. </Tip>

Cost Considerations

Summarization leverages your existing prompt cache from the conversation, so it costs about the same as any other tool call.

Since most input tokens are already cached, you're primarily paying for summary generation (output tokens), making it cost-effective.

Supported Models

Auto Compact uses advanced LLM-based summarization for these models:

Claude 4 series
Gemini 2.5 series
GPT-5
Grok 4

<Note> With other models, Cline falls back to standard rule-based context truncation, even if Auto Compact is enabled. </Note>

Restoring Context

You can use checkpoints to restore your task state from before a summarization occurred. You never truly lose context since you can always roll back.

Editing a message before a summarization tool call works similarly, restoring the conversation to that point.