Short-term Memory - Chatgpt On Wechat

Conversation context is the Agent's short-term memory, containing all messages in the current session (user input, Agent replies, tool calls and results). Proper context management is critical for the Agent's reasoning quality and cost control.

Context Structure

Each conversation turn consists of:

User message → Agent thinking → Tool call → Tool result → ... → Agent final reply

A single turn may include multiple tool calls (controlled by agent_max_steps). All tool calls and results are retained in context until compressed or trimmed.

Key Configuration

Parameter	Description	Default
`agent_max_context_tokens`	Maximum context token budget	`50000`
`agent_max_context_turns`	Maximum conversation turns in context	`20`
`agent_max_steps`	Maximum decision steps per turn (tool call count)	`15`

Configurable via config.json or the /config chat command.

Compression Strategy

When context exceeds limits, the system automatically compresses to free space. The process has multiple stages:

1. Tool Result Truncation

Before each decision loop, the system checks tool call results in historical turns. Results exceeding 20,000 characters are truncated, keeping only the beginning and end with a truncation notice. Current turn results are not affected.

2. Turn Trimming

When conversation turns exceed agent_max_context_turns:

The oldest half of complete turns is trimmed (preserving tool call chain integrity)
Trimmed messages are summarized by LLM and written to the daily memory file
Once the LLM summary is ready, it is also injected into the first user message of the retained context, helping the model maintain conversational continuity
Summary injection runs asynchronously in the background and takes effect from the next turn onward

3. Token Budget Trimming

After turn trimming, if tokens still exceed the budget:

Fewer than 5 turns: All turns undergo text compression — each turn keeps only the first user text and last Agent reply, removing intermediate tool call chains
5 or more turns: The first half of turns is trimmed again, with discarded content written to memory and a context summary injected

4. Overflow Emergency Handling

When the model API returns a context overflow error:

All current messages are summarized and written to memory
Aggressive trimming is applied (tool results limited to 10K chars, user text to 10K, max 5 turns)
If still overflowing, the entire conversation context is cleared

Session Persistence

Conversation messages are persisted to a local database, automatically restored after service restart. Restore strategy:

Restores the most recent max(3, max_context_turns / 6) turns
Only retains each turn's user text and Agent final reply, not intermediate tool call chains
Sessions older than 30 days are automatically cleaned up

Commands

Use these commands in chat to manage context:

Command	Description
`/context`	View current context statistics (message count, role distribution, total characters)
`/context clear`	Clear current session context
`/config agent_max_context_tokens 80000`	Adjust context token budget
`/config agent_max_context_turns 30`	Adjust context turn limit

<Tip> After clearing context, the Agent "forgets" previous conversation content. Content that was already written to long-term memory can still be retrieved via memory search. </Tip>