docs/content/Guides/compression.md
DocsGPT implements a smart context compression system to manage long conversations effectively. This feature prevents conversations from hitting the LLM's context window limit while preserving critical information and continuity.
The compression system operates on a "summarize and truncate" principle:
You can configure the compression behavior in your .env file or application/core/settings.py:
| Setting | Default | Description |
|---|---|---|
ENABLE_CONVERSATION_COMPRESSION | True | Master switch to enable/disable the feature. |
COMPRESSION_THRESHOLD_PERCENTAGE | 0.8 | The fraction of the context window (0.0 to 1.0) that triggers compression. |
COMPRESSION_MODEL_OVERRIDE | None | (Optional) Specify a different model ID to use specifically for the summarization task (e.g., using gpt-3.5-turbo to compress for gpt-4). |
COMPRESSION_MAX_HISTORY_POINTS | 3 | The number of past compression points to keep in the database (older ones are discarded as they are incorporated into newer summaries). |
The system is modularized into several components:
CompressionThresholdChecker: Calculates token usage and decides when to compress.CompressionService: Orchestrates the compression process, manages DB updates, and reconstructs the context (Summary + Recent Messages) for the LLM.CompressionPromptBuilder: Constructs the specific prompts used to instruct the LLM to summarize the conversation effectively.