Back to Qwen Code

Session Recap Design

docs/design/session-recap/session-recap-design.md

0.18.017.4 KB
Original Source

Session Recap Design

A brief (1-2 sentence) "where did I leave off" summary surfaced when the user returns to an idle session, either on demand (/recap) or after the terminal has been blurred for 5+ minutes.

Overview

When a user /resumes an old session days later, scrolling back through pages of history to remember what they were doing and what came next is a real friction point. Just reloading messages does not solve this UX problem.

The goal is to proactively surface a brief 1-2 sentence recap when the user returns:

  • High-level task (what they are doing) → next step (what to do next).
  • Visually distinct from real assistant replies, so it is never mistaken for new model output.
  • Best-effort: failures must be silent and never break the main flow.

Triggers

TriggerConditionsImplementation
ManualUser runs /recaprecapCommand.ts calls the same underlying service
AutoTerminal blurred (DECSET 1004 focus protocol) for ≥ 5 min + focus returns + stream is IdleuseAwaySummary.ts — 5min blur timer + useFocus event listener
Daemon HTTPRemote client calls POST /session/:id/recapserver.ts route → bridge.generateSessionRecap (ext-method roundtrip) → acpAgent.ts calls generateSessionRecap(session.getConfig(), signal)

All three paths funnel into the same generateSessionRecap() function in core/services/sessionRecap.ts to guarantee identical behavior. The auto-trigger is gated by general.showSessionRecap (default: off — explicit opt-in, so ambient LLM calls are never silently added to a user's bill); the manual command and daemon HTTP route ignore that setting (the caller is making an explicit request).

Daemon access path

The daemon route is non-strict-gated (mirrors /session/:id/prompt's posture — recap costs tokens but mutates no state). Capability tag session_recap advertises the route on /capabilities.features. SDK helpers: DaemonClient.recapSession(sessionId, opts) and DaemonSessionClient.recap(opts). See docs/developers/qwen-serve-protocol.md § POST /session/:id/recap for the wire contract and error envelope.

Cancellation is absent in v1. The route does not listen for HTTP client disconnect, no AbortSignal is threaded into bridge.generateSessionRecap, and the ACP child handler passes a never-aborting AbortController().signal to the core helper (no cross-process abort plumbing yet). The only ceilings are the bridge's 60s SESSION_RECAP_TIMEOUT_MS backstop and the transport-closed race against ACP channel death. Wiring an HTTP-side AbortController in isolation would be cosmetic — the child-side LLM call would still run to completion, so e2e cancel is not achievable without the cross- process abort piece. This is acceptable for v1 because recap is short (single-attempt side-query, maxOutputTokens: 300, ~1–5s typical). A future request-id-based cancel ext-method can plumb full end-to-end cancellation if/when the bandwidth cost justifies it.

Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                          AppContainer.tsx                              │
│   isFocused = useFocus()                                               │
│   isIdle = streamingState === Idle                                     │
│       │                                                                │
│       ├─→ useAwaySummary({enabled, config, isFocused, isIdle,          │
│       │       │             addItem})                                  │
│       │       └─→ 5 min blur timer + idle/dedupe gates                 │
│       │              │                                                 │
│       │              ↓                                                 │
│       └─→ recapCommand (slash) ─→ generateSessionRecap(config, signal) │
│                                          │                             │
│                                          ↓                             │
│                              ┌─────────────────────────┐               │
│                              │ packages/core/services/ │               │
│                              │   sessionRecap.ts       │               │
│                              └─────────────────────────┘               │
│                                          │                             │
│                                          ↓                             │
│                              GeminiClient.generateContent              │
│                              (fastModel + tools:[])                    │
│                                                                        │
│   addItem({type: 'away_recap', text}) ─→ HistoryItemDisplay            │
│       └─ AwayRecapMessage rendered inline like any other history       │
│         item (※ + bold "recap: " + italic content, all dim);           │
│         scrolls naturally with the conversation. Mirrors Claude        │
│         Code's away_summary system message.                            │
└────────────────────────────────────────────────────────────────────────┘

Files

FileResponsibility
packages/core/src/services/sessionRecap.tsOne-shot LLM call + history filter + tag extraction
packages/cli/src/ui/hooks/useAwaySummary.tsAuto-trigger React hook
packages/cli/src/ui/commands/recapCommand.ts/recap manual entry point
packages/cli/src/ui/components/messages/StatusMessages.tsxAwayRecapMessage renderer ( + bold recap: + italic content, all dim)
packages/cli/src/ui/types.tsHistoryItemAwayRecap type
packages/cli/src/ui/components/HistoryItemDisplay.tsxDispatches away_recap history items to the renderer
packages/cli/src/config/settingsSchema.tsgeneral.showSessionRecap + general.sessionRecapAwayThresholdMinutes settings

Prompt Design

System Prompt

generationConfig.systemInstruction replaces the main agent's system prompt for this single call, so the model behaves only as a recap generator and not as a coding assistant.

Note that GeminiClient.generateContent() internally runs the prompt through getCustomSystemPrompt(), which appends the user's memory (QWEN.md / managed auto-memory) as a suffix. The final system prompt is therefore recap prompt + user memory — useful project context for the recap, not a leak.

Bullets below correspond 1:1 with RECAP_SYSTEM_PROMPT:

  • Under 40 words, 1-2 plain sentences (no markdown / lists / headings). For Chinese, treat the budget as roughly 80 characters total.
  • First sentence: the high-level task. Then: the concrete next step.
  • Explicitly forbid: listing what was done, reciting tool calls, status reports.
  • Match the dominant language of the conversation (English or Chinese).
  • Wrap output in <recap>...</recap>; nothing outside the tags.

Structured Output + Extraction

The model is instructed to wrap its answer in <recap>...</recap>:

<recap>Refactoring loopDetectionService.ts to address long-session OOM. Next step is to implement option B.</recap>

Why: some models (GLM family, reasoning models) write a "thinking" paragraph before the final answer. Returning the raw text would leak that reasoning into the UI.

extractRecap() has three fallback tiers:

  1. Both tags present: take what is between <recap>...</recap> (preferred).
  2. Only the open tag (e.g. maxOutputTokens truncated the close tag): take everything after the open tag.
  3. Tag missing entirely: return empty string → service returns null → UI renders nothing.

The third tier is "skip rather than show the wrong thing" — surfacing the model's reasoning preamble is worse than showing no recap at all.

Call Parameters

ParameterValueReason
modelgetFastModel() ?? getModel()Recap doesn't need a frontier model
tools[]One-shot query, no tool use
maxOutputTokens300Headroom for 1-2 short sentences + tags
temperature0.3Mostly deterministic, with a bit of natural variation
systemInstructionThe recap-only prompt aboveReplaces the main agent's role definition

History Filtering

geminiClient.getChat().getHistory() returns a Content[] that includes:

  • user / model text messages
  • model functionCall parts
  • user functionResponse parts (which can hold full file contents)
  • model thought parts (part.thought / part.thoughtSignature, the model's hidden reasoning)

filterToDialog() keeps only user / model parts that have non-empty text and are not thoughts. Two reasons:

  • Tool calls / responses: a single functionResponse can be 10K+ tokens. 30 such messages would drown the recap LLM in irrelevant detail, both wasting tokens and biasing the recap toward implementation noise like "called X tool to read Y file".
  • Thought parts: carry the model's internal reasoning. Including them risks treating hidden chain-of-thought as dialogue and surfacing it in the recap text.

After dropping empty messages, takeRecentDialog slices to the last 30 messages and refuses to start the slice on a dangling model/tool response.

Concurrency and Edge Cases

Auto-trigger hook state machine

useAwaySummary keeps three refs:

RefMeaning
blurredAtRefBlur start time (not cleared until focus returns)
recapPendingRefWhether an LLM call is in flight
inFlightRefThe current in-flight AbortController

useEffect deps: [enabled, config, isFocused, isIdle, addItem, thresholdMs].

EventAction
!enabled || !configAbort in-flight call + clear inFlightRef + clear blurredAtRef
!isFocused and blurredAtRef === nullSet blurredAtRef = Date.now()
isFocused and blurredAtRef === nullReturn early (no blur cycle to handle — first render or right after a brief-blur reset)
isFocused and blur duration < 5 minClear blurredAtRef, wait for next blur cycle
isFocused and blur ≥ 5 min and recapPendingRefReturn (dedupe)
isFocused and blur ≥ 5 min and !isIdlePreserve blurredAtRef and wait for the turn to finish (isIdle is in the deps, so the effect re-fires when streaming completes)
isFocused and blur ≥ 5 min and shouldFireRecap returns falseClear blurredAtRef and return — conversation hasn't moved enough since the last recap (≥ 2 user turns required, mirrors Claude Code)
isFocused and all conditions metClear blurredAtRef, set recapPendingRef = true, create AbortController, send the LLM request

The .then callback re-checks isIdleRef.current: if the user has started a new turn while the LLM was running, the late-arriving recap is dropped to avoid inserting it mid-turn.

The .finally clears recapPendingRef, and clears inFlightRef only if inFlightRef.current === controller (so it doesn't overwrite a newer controller).

A second useEffect aborts the in-flight controller on unmount.

/recap gating

CommandContext.ui.isIdleRef exposes the current stream state (mirroring the existing btwAbortControllerRef pattern). In interactive mode, recapCommand refuses when !isIdleRef.current or pendingItem !== null. pendingItem alone is insufficient because a normal model reply runs with streamingState === Responding and a null pendingItem.

Configuration and Model Selection

User-facing knobs

SettingDefaultNotes
general.showSessionRecapfalseAuto-trigger only. Manual /recap ignores this.
general.sessionRecapAwayThresholdMinutes5Minutes blurred before auto-recap fires on focus-in. Matches Claude Code's default.
fastModelunsetRecommended (e.g. qwen3-coder-flash) for fast and cheap recaps.

Model fallback

config.getFastModel() ?? config.getModel():

  • User has a fastModel set and it is valid for the current auth type → use fastModel.
  • Otherwise → fall back to the main session model (works, just costlier and slower).

Observability

createDebugLogger('SESSION_RECAP') emits:

  • caught exceptions from the recap path (debugLogger.warn).

All failures are fully transparent to the user — recap is an auxiliary feature and never throws into the UI. Developers can grep for the [SESSION_RECAP] tag in the debug log file: written by default to ~/.qwen/debug/<sessionId>.txt (latest.txt symlinks to the current session); disable via QWEN_DEBUG_LOG_FILE=0.

Out of Scope

ItemWhy not
Progress UI for /recap (spinner / pendingItem)3-5 second wait is tolerable; adds complexity.
Automated testsService is small (~150 lines), end-to-end tested manually first; unit tests can land in a separate PR.
Localized promptsThe system prompt is for the model; English is the most reliable substrate. The model selects the output language from the conversation.
QWEN_CODE_ENABLE_AWAY_SUMMARY env varClaude Code uses it to keep the feature on when telemetry is disabled; Qwen Code's current telemetry model doesn't need this.
Auto-recap on /resume completionA natural follow-up but needs a hook point in useResumeCommand; out of scope for this PR.