backend/onyx/chat/COMPRESSION.md
Compresses long chat histories by summarizing older messages while keeping recent ones verbatim.
Summaries are stored as ChatMessage records with two key fields:
parent_message_id → last message when compression triggered (places summary in the tree)last_summarized_message_id → pointer to an older message up the chain (the cutoff). Messages after this are kept verbatim.Why store summary as a separate message? If we embedded the summary in the last_summarized_message_id message itself, that message would contain context from messages that came after it—context that doesn't exist in other branches. By creating the summary as a new message attached to the branch tip, it only applies to the specific branch where compression occurred. It's only back-pointed to by the
branch which it applies to. All of this is necessary because we keep the last few messages verbatim and also to support branching logic.
Subsequent compressions incorporate the existing summary text + new messages, preventing information loss in very long conversations.
The LLM receives older messages, a cutoff marker, then recent messages. It summarizes only content before the marker while using recent context to inform what's important.
Context window breakdown:
max_context_tokens — LLM's total context windowreserved_tokens — space for system prompt, tools, files, etc.max_context_tokens - reserved_tokens
Note: If there is a lot of reserved tokens, chat compression may happen fairly frequently which is costly, slow, and leads to a bad user experience. Possible area of future improvement.Configurable ratios:
COMPRESSION_TRIGGER_RATIO (default 0.75) — compress when chat history exceeds this ratio of available spaceRECENT_MESSAGES_RATIO (default 0.2) — portion of chat history to keep verbatim when compressinghistory_tokens > available * 0.75ChatMessage with parent_message_id + last_summarized_message_id| Function | Purpose |
|---|---|
get_compression_params | Check if compression needed based on token counts |
find_summary_for_branch | Find applicable summary by checking parent_message_id membership |
get_messages_to_summarize | Split messages at token budget boundary |
compress_chat_history | Orchestrate flow, save summary message |