docs/users/features/followup-suggestions.md
Qwen Code can predict what you want to type next and show it as placeholder text in the input area. This feature uses an LLM call to analyze the conversation context and generate a natural next step suggestion.
This feature works end-to-end in the CLI. In the WebUI, the hook and UI plumbing are available, but host applications must trigger suggestion generation and wire the followup state for suggestions to appear.
After Qwen Code finishes responding, a suggestion appears as dimmed placeholder text in the input area after a short delay (~300ms). For example, after fixing a bug, you might see:
> run the tests
The suggestion is generated by sending the conversation history to the model, which predicts what you would naturally type next. If the response contains an explicit tip (e.g., Tip: type post comments to publish findings), the suggested action is extracted automatically.
| Key | Action |
|---|---|
Tab | Accept the suggestion and fill it into the input |
Enter | Accept the suggestion and fill it into the input |
Right Arrow | Accept the suggestion and fill it into the input |
| Any typing | Dismiss the suggestion and type normally |
Enter fills the input rather than submitting, so accepting a suggested slash command (e.g. /clear) never auto-executes — you submit it yourself with a second Enter.
Suggestions are generated when all of the following conditions are met:
planui.enableFollowupSuggestions to false to turn it off)Suggestions will not appear in non-interactive mode (e.g., headless/SDK mode).
Suggestions are automatically dismissed when:
By default, suggestions use the same model as your main conversation. For lower-latency suggestions, configure a dedicated fast model:
/model --fast qwen3-coder-flash
Or use /model --fast (without a model name) to open a selection dialog.
{
"fastModel": "qwen3-coder-flash"
}
The fast model is used for prompt suggestions and speculative execution. When not configured, the main conversation model is used as fallback.
Cost note: A fast model lowers latency, but it does not always lower cost. Suggestion generation reuses your conversation's prefix cache (via
ui.enableCacheSharing, on by default) — but a prefix cache is per-model. PointingfastModelat a different model forks to a separate cache, so the whole conversation history is re-billed as uncached input on the fast model. On long conversations, the default (main model + shared cache) can be cheaper than a fast model, since most of the history is billed at the discounted cached rate. SetfastModelwhen latency matters more than per-turn cost.
Thinking/reasoning mode is automatically disabled for all background tasks (suggestion generation and speculation), regardless of your main model's thinking configuration. This avoids wasting tokens on internal reasoning that isn't needed for these tasks.
These settings can be configured in settings.json:
| Setting | Type | Default | Description |
|---|---|---|---|
ui.enableFollowupSuggestions | boolean | true | Enable or disable followup suggestions |
ui.enableCacheSharing | boolean | true | Use cache-aware forked queries to reduce cost (experimental) |
ui.enableSpeculation | boolean | false | Speculatively execute suggestions before submission (experimental) |
fastModel | string | "" | Model for prompt suggestions and speculative execution |
{
"fastModel": "qwen3-coder-flash",
"ui": {
"enableFollowupSuggestions": true,
"enableCacheSharing": true
}
}
Suggestion model usage appears in /stats output, showing tokens consumed by the fast model for suggestion generation.
The fast model is also shown in /about output under "Fast Model".
Suggestions go through quality filters to ensure they are useful:
/commit) are always allowed as single-word suggestions