docs/design/session-recap/session-recap-design.md
A brief (1-2 sentence) "where did I leave off" summary surfaced when the user returns to an idle session, either on demand (
/recap) or after the terminal has been blurred for 5+ minutes.
When a user /resumes an old session days later, scrolling back through
pages of history to remember what they were doing and what came next
is a real friction point. Just reloading messages does not solve this
UX problem.
The goal is to proactively surface a brief 1-2 sentence recap when the user returns:
| Trigger | Conditions | Implementation |
|---|---|---|
| Manual | User runs /recap | recapCommand.ts calls the same underlying service |
| Auto | Terminal blurred (DECSET 1004 focus protocol) for ≥ 5 min + focus returns + stream is Idle | useAwaySummary.ts — 5min blur timer + useFocus event listener |
Both paths funnel into a single function — generateSessionRecap() — to
guarantee identical behavior. The auto-trigger is gated by
general.showSessionRecap (default: off — explicit opt-in, so ambient
LLM calls are never silently added to a user's bill); the manual
command ignores that setting.
┌────────────────────────────────────────────────────────────────────────┐
│ AppContainer.tsx │
│ isFocused = useFocus() │
│ isIdle = streamingState === Idle │
│ │ │
│ ├─→ useAwaySummary({enabled, config, isFocused, isIdle, │
│ │ │ addItem}) │
│ │ └─→ 5 min blur timer + idle/dedupe gates │
│ │ │ │
│ │ ↓ │
│ └─→ recapCommand (slash) ─→ generateSessionRecap(config, signal) │
│ │ │
│ ↓ │
│ ┌─────────────────────────┐ │
│ │ packages/core/services/ │ │
│ │ sessionRecap.ts │ │
│ └─────────────────────────┘ │
│ │ │
│ ↓ │
│ GeminiClient.generateContent │
│ (fastModel + tools:[]) │
│ │
│ addItem({type: 'away_recap', text}) ─→ HistoryItemDisplay │
│ └─ AwayRecapMessage rendered inline like any other history │
│ item (※ + bold "recap: " + italic content, all dim); │
│ scrolls naturally with the conversation. Mirrors Claude │
│ Code's away_summary system message. │
└────────────────────────────────────────────────────────────────────────┘
| File | Responsibility |
|---|---|
packages/core/src/services/sessionRecap.ts | One-shot LLM call + history filter + tag extraction |
packages/cli/src/ui/hooks/useAwaySummary.ts | Auto-trigger React hook |
packages/cli/src/ui/commands/recapCommand.ts | /recap manual entry point |
packages/cli/src/ui/components/messages/StatusMessages.tsx | AwayRecapMessage renderer (※ + bold recap: + italic content, all dim) |
packages/cli/src/ui/types.ts | HistoryItemAwayRecap type |
packages/cli/src/ui/components/HistoryItemDisplay.tsx | Dispatches away_recap history items to the renderer |
packages/cli/src/config/settingsSchema.ts | general.showSessionRecap + general.sessionRecapAwayThresholdMinutes settings |
generationConfig.systemInstruction replaces the main agent's system
prompt for this single call, so the model behaves only as a recap
generator and not as a coding assistant.
Note that GeminiClient.generateContent() internally runs the prompt
through getCustomSystemPrompt(), which appends the user's memory
(QWEN.md / managed auto-memory) as a suffix. The final system prompt is
therefore recap prompt + user memory — useful project context for the
recap, not a leak.
Bullets below correspond 1:1 with RECAP_SYSTEM_PROMPT:
<recap>...</recap>; nothing outside the tags.The model is instructed to wrap its answer in <recap>...</recap>:
<recap>Refactoring loopDetectionService.ts to address long-session OOM. Next step is to implement option B.</recap>
Why: some models (GLM family, reasoning models) write a "thinking" paragraph before the final answer. Returning the raw text would leak that reasoning into the UI.
extractRecap() has three fallback tiers:
<recap>...</recap> (preferred).maxOutputTokens truncated the close tag):
take everything after the open tag.null
→ UI renders nothing.The third tier is "skip rather than show the wrong thing" — surfacing the model's reasoning preamble is worse than showing no recap at all.
| Parameter | Value | Reason |
|---|---|---|
model | getFastModel() ?? getModel() | Recap doesn't need a frontier model |
tools | [] | One-shot query, no tool use |
maxOutputTokens | 300 | Headroom for 1-2 short sentences + tags |
temperature | 0.3 | Mostly deterministic, with a bit of natural variation |
systemInstruction | The recap-only prompt above | Replaces the main agent's role definition |
geminiClient.getChat().getHistory() returns a Content[] that
includes:
user / model text messagesmodel functionCall partsuser functionResponse parts (which can hold full file contents)model thought parts (part.thought / part.thoughtSignature,
the model's hidden reasoning)filterToDialog() keeps only user / model parts that have non-empty
text and are not thoughts. Two reasons:
functionResponse can be 10K+
tokens. 30 such messages would drown the recap LLM in irrelevant
detail, both wasting tokens and biasing the recap toward
implementation noise like "called X tool to read Y file".After dropping empty messages, takeRecentDialog slices to the last 30
messages and refuses to start the slice on a dangling model/tool
response.
useAwaySummary keeps three refs:
| Ref | Meaning |
|---|---|
blurredAtRef | Blur start time (not cleared until focus returns) |
recapPendingRef | Whether an LLM call is in flight |
inFlightRef | The current in-flight AbortController |
useEffect deps: [enabled, config, isFocused, isIdle, addItem, thresholdMs].
| Event | Action |
|---|---|
!enabled || !config | Abort in-flight call + clear inFlightRef + clear blurredAtRef |
!isFocused and blurredAtRef === null | Set blurredAtRef = Date.now() |
isFocused and blurredAtRef === null | Return early (no blur cycle to handle — first render or right after a brief-blur reset) |
isFocused and blur duration < 5 min | Clear blurredAtRef, wait for next blur cycle |
isFocused and blur ≥ 5 min and recapPendingRef | Return (dedupe) |
isFocused and blur ≥ 5 min and !isIdle | Preserve blurredAtRef and wait for the turn to finish (isIdle is in the deps, so the effect re-fires when streaming completes) |
isFocused and blur ≥ 5 min and shouldFireRecap returns false | Clear blurredAtRef and return — conversation hasn't moved enough since the last recap (≥ 2 user turns required, mirrors Claude Code) |
isFocused and all conditions met | Clear blurredAtRef, set recapPendingRef = true, create AbortController, send the LLM request |
The .then callback re-checks isIdleRef.current: if the user has
started a new turn while the LLM was running, the late-arriving recap
is dropped to avoid inserting it mid-turn.
The .finally clears recapPendingRef, and clears inFlightRef only
if inFlightRef.current === controller (so it doesn't overwrite a
newer controller).
A second useEffect aborts the in-flight controller on unmount.
/recap gatingCommandContext.ui.isIdleRef exposes the current stream state
(mirroring the existing btwAbortControllerRef pattern). In
interactive mode, recapCommand refuses when !isIdleRef.current
or pendingItem !== null. pendingItem alone is insufficient
because a normal model reply runs with streamingState === Responding
and a null pendingItem.
| Setting | Default | Notes |
|---|---|---|
general.showSessionRecap | false | Auto-trigger only. Manual /recap ignores this. |
general.sessionRecapAwayThresholdMinutes | 5 | Minutes blurred before auto-recap fires on focus-in. Matches Claude Code's default. |
fastModel | unset | Recommended (e.g. qwen3-coder-flash) for fast and cheap recaps. |
config.getFastModel() ?? config.getModel():
fastModel set and it is valid for the current auth type
→ use fastModel.createDebugLogger('SESSION_RECAP') emits:
debugLogger.warn).All failures are fully transparent to the user — recap is an
auxiliary feature and never throws into the UI. Developers can grep for
the [SESSION_RECAP] tag in the debug log file: written by default to
~/.qwen/debug/<sessionId>.txt (latest.txt symlinks to the current
session); disable via QWEN_DEBUG_LOG_FILE=0.
| Item | Why not |
|---|---|
Progress UI for /recap (spinner / pendingItem) | 3-5 second wait is tolerable; adds complexity. |
| Automated tests | Service is small (~150 lines), end-to-end tested manually first; unit tests can land in a separate PR. |
| Localized prompts | The system prompt is for the model; English is the most reliable substrate. The model selects the output language from the conversation. |
QWEN_CODE_ENABLE_AWAY_SUMMARY env var | Claude Code uses it to keep the feature on when telemetry is disabled; Qwen Code's current telemetry model doesn't need this. |
Auto-recap on /resume completion | A natural follow-up but needs a hook point in useResumeCommand; out of scope for this PR. |