backend/docs/summarization.md
DeerFlow includes automatic conversation summarization to handle long conversations that approach model token limits. When enabled, the system automatically condenses older messages while preserving recent context.
The summarization feature uses LangChain's SummarizationMiddleware to monitor conversation history and trigger summarization based on configurable thresholds. When activated, it:
Summarization is configured in config.yaml under the summarization key:
summarization:
enabled: true
model_name: null # Use default model or specify a lightweight model
# Trigger conditions (OR logic - any condition triggers summarization)
trigger:
- type: tokens
value: 4000
# Additional triggers (optional)
# - type: messages
# value: 50
# - type: fraction
# value: 0.8 # 80% of model's max input tokens
# Context retention policy
keep:
type: messages
value: 20
# Token trimming for summarization call
trim_tokens_to_summarize: 4000
# Custom summary prompt (optional)
summary_prompt: null
enabledfalsemodel_namenull (uses default model)gpt-4o-mini or equivalent.triggerContextSize or list of ContextSize objectsContextSize Types:
Token-based trigger: Activates when token count reaches the specified value
trigger:
type: tokens
value: 4000
Message-based trigger: Activates when message count reaches the specified value
trigger:
type: messages
value: 50
Fraction-based trigger: Activates when token usage reaches a percentage of the model's maximum input tokens
trigger:
type: fraction
value: 0.8 # 80% of max input tokens
Multiple Triggers:
trigger:
- type: tokens
value: 4000
- type: messages
value: 50
keepContextSize object{type: messages, value: 20}Examples:
# Keep most recent 20 messages
keep:
type: messages
value: 20
# Keep most recent 3000 tokens
keep:
type: tokens
value: 3000
# Keep most recent 30% of model's max input tokens
keep:
type: fraction
value: 0.3
trim_tokens_to_summarize4000null to skip trimming (not recommended for very long conversations).summary_promptnull (uses LangChain's default prompt)Default Prompt Behavior: The default LangChain prompt instructs the model to:
keep threshold)keep threshold)token_counter functionThe middleware intelligently preserves message context:
keep configurationHere is a summary of the conversation to date:
[Generated summary text]
Token-based triggers: Recommended for most use cases
Message-based triggers: Useful for controlling conversation length
Fraction-based triggers: Ideal when using multiple models
keep)Message-based retention: Best for most scenarios
Token-based retention: Use when precise control is needed
Fraction-based retention: For multi-model setups
Recommended: Use a lightweight, cost-effective model for summaries
gpt-4o-mini, claude-haiku, or equivalentDefault: If model_name is null, uses the default model
Balance triggers: Combine token and message triggers for robust handling
trigger:
- type: tokens
value: 4000
- type: messages
value: 50
Conservative retention: Keep more messages initially, adjust based on performance
keep:
type: messages
value: 25 # Start higher, reduce if needed
Trim strategically: Limit tokens sent to summarization model
trim_tokens_to_summarize: 4000 # Prevents expensive summarization calls
Monitor and iterate: Track summary quality and adjust configuration
Problem: Summaries losing important context
Solutions:
keep value to preserve more messagessummary_prompt to emphasize key informationProblem: Summarization calls taking too long
Solutions:
gpt-4o-mini)trim_tokens_to_summarize to send less contextProblem: Still hitting token limits despite summarization
Solutions:
keep value to preserve fewer messagespackages/harness/deerflow/config/summarization_config.pypackages/harness/deerflow/agents/lead_agent/agent.pylangchain.agents.middleware.SummarizationMiddlewareSummarization runs after ThreadData and Sandbox initialization but before Title and Clarification:
summarization:
enabled: true
trigger:
type: tokens
value: 4000
keep:
type: messages
value: 20
summarization:
enabled: true
model_name: gpt-4o-mini # Lightweight model for cost efficiency
trigger:
- type: tokens
value: 6000
- type: messages
value: 75
keep:
type: messages
value: 25
trim_tokens_to_summarize: 5000
summarization:
enabled: true
model_name: gpt-4o-mini
trigger:
type: fraction
value: 0.7 # 70% of model's max input
keep:
type: fraction
value: 0.3 # Keep 30% of max input
trim_tokens_to_summarize: 4000
summarization:
enabled: true
model_name: gpt-4 # Use full model for high-quality summaries
trigger:
type: tokens
value: 8000
keep:
type: messages
value: 40 # Keep more context
trim_tokens_to_summarize: null # No trimming