Session Recap Design

A brief (1-2 sentence) "where did I leave off" summary surfaced when the user returns to an idle session, either on demand (/recap) or after the terminal has been blurred for 5+ minutes.

Overview

When a user /resumes an old session days later, scrolling back through pages of history to remember what they were doing and what came next is a real friction point. Just reloading messages does not solve this UX problem.

The goal is to proactively surface a brief 1-2 sentence recap when the user returns:

High-level task (what they are doing) → next step (what to do next).
Visually distinct from real assistant replies, so it is never mistaken for new model output.
Best-effort: failures must be silent and never break the main flow.

Triggers

Trigger	Conditions	Implementation
Manual	User runs `/recap`	`recapCommand.ts` calls the same underlying service
Auto	Terminal blurred (DECSET 1004 focus protocol) for ≥ 5 min + focus returns + stream is `Idle`	`useAwaySummary.ts` — 5min blur timer + `useFocus` event listener
Daemon HTTP	Remote client calls `POST /session/:id/recap`	`server.ts` route → `bridge.generateSessionRecap` (ext-method roundtrip) → `acpAgent.ts` calls `generateSessionRecap(session.getConfig(), signal)`

All three paths funnel into the same generateSessionRecap() function in core/services/sessionRecap.ts to guarantee identical behavior. The auto-trigger is gated by general.showSessionRecap (default: off — explicit opt-in, so ambient LLM calls are never silently added to a user's bill); the manual command and daemon HTTP route ignore that setting (the caller is making an explicit request).

Daemon access path

The daemon route is non-strict-gated (mirrors /session/:id/prompt's posture — recap costs tokens but mutates no state). Capability tag session_recap advertises the route on /capabilities.features. SDK helpers: DaemonClient.recapSession(sessionId, opts) and DaemonSessionClient.recap(opts). See docs/developers/qwen-serve-protocol.md § POST /session/:id/recap for the wire contract and error envelope.

Cancellation is absent in v1. The route does not listen for HTTP client disconnect, no AbortSignal is threaded into bridge.generateSessionRecap, and the ACP child handler passes a never-aborting AbortController().signal to the core helper (no cross-process abort plumbing yet). The only ceilings are the bridge's 60s SESSION_RECAP_TIMEOUT_MS backstop and the transport-closed race against ACP channel death. Wiring an HTTP-side AbortController in isolation would be cosmetic — the child-side LLM call would still run to completion, so e2e cancel is not achievable without the cross- process abort piece. This is acceptable for v1 because recap is short (single-attempt side-query, maxOutputTokens: 300, ~1–5s typical). A future request-id-based cancel ext-method can plumb full end-to-end cancellation if/when the bandwidth cost justifies it.

Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                          AppContainer.tsx                              │
│   isFocused = useFocus()                                               │
│   isIdle = streamingState === Idle                                     │
│       │                                                                │
│       ├─→ useAwaySummary({enabled, config, isFocused, isIdle,          │
│       │       │             addItem})                                  │
│       │       └─→ 5 min blur timer + idle/dedupe gates                 │
│       │              │                                                 │
│       │              ↓                                                 │
│       └─→ recapCommand (slash) ─→ generateSessionRecap(config, signal) │
│                                          │                             │
│                                          ↓                             │
│                              ┌─────────────────────────┐               │
│                              │ packages/core/services/ │               │
│                              │   sessionRecap.ts       │               │
│                              └─────────────────────────┘               │
│                                          │                             │
│                                          ↓                             │
│                              GeminiClient.generateContent              │
│                              (fastModel + tools:[])                    │
│                                                                        │
│   addItem({type: 'away_recap', text}) ─→ HistoryItemDisplay            │
│       └─ AwayRecapMessage rendered inline like any other history       │
│         item (※ + bold "recap: " + italic content, all dim);           │
│         scrolls naturally with the conversation. Mirrors Claude        │
│         Code's away_summary system message.                            │
└────────────────────────────────────────────────────────────────────────┘

Files

File	Responsibility
`packages/core/src/services/sessionRecap.ts`	One-shot LLM call + history filter + tag extraction
`packages/cli/src/ui/hooks/useAwaySummary.ts`	Auto-trigger React hook
`packages/cli/src/ui/commands/recapCommand.ts`	`/recap` manual entry point
`packages/cli/src/ui/components/messages/StatusMessages.tsx`	`AwayRecapMessage` renderer (`※` + bold `recap:` + italic content, all dim)
`packages/cli/src/ui/types.ts`	`HistoryItemAwayRecap` type
`packages/cli/src/ui/components/HistoryItemDisplay.tsx`	Dispatches `away_recap` history items to the renderer
`packages/cli/src/config/settingsSchema.ts`	`general.showSessionRecap` + `general.sessionRecapAwayThresholdMinutes` settings

Prompt Design

System Prompt

generationConfig.systemInstruction replaces the main agent's system prompt for this single call, so the model behaves only as a recap generator and not as a coding assistant.

Note that GeminiClient.generateContent() internally runs the prompt through getCustomSystemPrompt(), which appends the user's memory (QWEN.md / managed auto-memory) as a suffix. The final system prompt is therefore recap prompt + user memory — useful project context for the recap, not a leak.

Bullets below correspond 1:1 with RECAP_SYSTEM_PROMPT:

Under 40 words, 1-2 plain sentences (no markdown / lists / headings). For Chinese, treat the budget as roughly 80 characters total.
First sentence: the high-level task. Then: the concrete next step.
Explicitly forbid: listing what was done, reciting tool calls, status reports.
Match the dominant language of the conversation (English or Chinese).
Wrap output in <recap>...</recap>; nothing outside the tags.

Structured Output + Extraction

The model is instructed to wrap its answer in <recap>...</recap>:

<recap>Refactoring loopDetectionService.ts to address long-session OOM. Next step is to implement option B.</recap>

Why: some models (GLM family, reasoning models) write a "thinking" paragraph before the final answer. Returning the raw text would leak that reasoning into the UI.

extractRecap() has three fallback tiers:

Both tags present: take what is between <recap>...</recap> (preferred).
Only the open tag (e.g. maxOutputTokens truncated the close tag): take everything after the open tag.
Tag missing entirely: return empty string → service returns null → UI renders nothing.

The third tier is "skip rather than show the wrong thing" — surfacing the model's reasoning preamble is worse than showing no recap at all.

Call Parameters

Parameter	Value	Reason
`model`	`getFastModel() ?? getModel()`	Recap doesn't need a frontier model
`tools`	`[]`	One-shot query, no tool use
`maxOutputTokens`	`300`	Headroom for 1-2 short sentences + tags
`temperature`	`0.3`	Mostly deterministic, with a bit of natural variation
`systemInstruction`	The recap-only prompt above	Replaces the main agent's role definition

History Filtering

geminiClient.getChat().getHistory() returns a Content[] that includes:

user / model text messages
model functionCall parts
user functionResponse parts (which can hold full file contents)
model thought parts (part.thought / part.thoughtSignature, the model's hidden reasoning)

filterToDialog() keeps only user / model parts that have non-empty text and are not thoughts. Two reasons:

Tool calls / responses: a single functionResponse can be 10K+ tokens. 30 such messages would drown the recap LLM in irrelevant detail, both wasting tokens and biasing the recap toward implementation noise like "called X tool to read Y file".
Thought parts: carry the model's internal reasoning. Including them risks treating hidden chain-of-thought as dialogue and surfacing it in the recap text.

After dropping empty messages, takeRecentDialog slices to the last 30 messages and refuses to start the slice on a dangling model/tool response.

Concurrency and Edge Cases

Auto-trigger hook state machine

useAwaySummary keeps three refs:

Ref	Meaning
`blurredAtRef`	Blur start time (not cleared until focus returns)
`recapPendingRef`	Whether an LLM call is in flight
`inFlightRef`	The current in-flight `AbortController`

useEffect deps: [enabled, config, isFocused, isIdle, addItem, thresholdMs].

Event	Action
`!enabled \|\| !config`	Abort in-flight call + clear `inFlightRef` + clear `blurredAtRef`
`!isFocused` and `blurredAtRef === null`	Set `blurredAtRef = Date.now()`
`isFocused` and `blurredAtRef === null`	Return early (no blur cycle to handle — first render or right after a brief-blur reset)
`isFocused` and blur duration < 5 min	Clear `blurredAtRef`, wait for next blur cycle
`isFocused` and blur ≥ 5 min and `recapPendingRef`	Return (dedupe)
`isFocused` and blur ≥ 5 min and `!isIdle`	Preserve `blurredAtRef` and wait for the turn to finish (`isIdle` is in the deps, so the effect re-fires when streaming completes)
`isFocused` and blur ≥ 5 min and `shouldFireRecap` returns false	Clear `blurredAtRef` and return — conversation hasn't moved enough since the last recap (≥ 2 user turns required, mirrors Claude Code)
`isFocused` and all conditions met	Clear `blurredAtRef`, set `recapPendingRef = true`, create `AbortController`, send the LLM request

The .then callback re-checks isIdleRef.current: if the user has started a new turn while the LLM was running, the late-arriving recap is dropped to avoid inserting it mid-turn.

The .finally clears recapPendingRef, and clears inFlightRef only if inFlightRef.current === controller (so it doesn't overwrite a newer controller).

A second useEffect aborts the in-flight controller on unmount.

`/recap` gating

CommandContext.ui.isIdleRef exposes the current stream state (mirroring the existing btwAbortControllerRef pattern). In interactive mode, recapCommand refuses when !isIdleRef.current or pendingItem !== null. pendingItem alone is insufficient because a normal model reply runs with streamingState === Responding and a null pendingItem.

Configuration and Model Selection

User-facing knobs

Setting	Default	Notes
`general.showSessionRecap`	`false`	Auto-trigger only. Manual `/recap` ignores this.
`general.sessionRecapAwayThresholdMinutes`	`5`	Minutes blurred before auto-recap fires on focus-in. Matches Claude Code's default.
`fastModel`	unset	Recommended (e.g. `qwen3-coder-flash`) for fast and cheap recaps.

Model fallback

config.getFastModel() ?? config.getModel():

User has a fastModel set and it is valid for the current auth type → use fastModel.
Otherwise → fall back to the main session model (works, just costlier and slower).

Observability

createDebugLogger('SESSION_RECAP') emits:

caught exceptions from the recap path (debugLogger.warn).

All failures are fully transparent to the user — recap is an auxiliary feature and never throws into the UI. Developers can grep for the [SESSION_RECAP] tag in the debug log file: written by default to ~/.qwen/debug/<sessionId>.txt (latest.txt symlinks to the current session); disable via QWEN_DEBUG_LOG_FILE=0.

Out of Scope

Item	Why not
Progress UI for `/recap` (spinner / pendingItem)	3-5 second wait is tolerable; adds complexity.
Automated tests	Service is small (~150 lines), end-to-end tested manually first; unit tests can land in a separate PR.
Localized prompts	The system prompt is for the model; English is the most reliable substrate. The model selects the output language from the conversation.
`QWEN_CODE_ENABLE_AWAY_SUMMARY` env var	Claude Code uses it to keep the feature on when telemetry is disabled; Qwen Code's current telemetry model doesn't need this.
Auto-recap on `/resume` completion	A natural follow-up but needs a hook point in `useResumeCommand`; out of scope for this PR.

Session Recap Design

Session Recap Design

Overview

Triggers

Daemon access path

Architecture

Files

Prompt Design

System Prompt

Structured Output + Extraction

Call Parameters

History Filtering

Concurrency and Edge Cases

Auto-trigger hook state machine

/recap gating

Configuration and Model Selection

User-facing knobs

Model fallback

Observability

Out of Scope

`/recap` gating