docs/users/features/followup-suggestions.md
Qwen Code can predict what you want to type next and show it as ghost text in the input area. This feature uses an LLM call to analyze the conversation context and generate a natural next step suggestion.
This feature works end-to-end in the CLI. In the WebUI, the hook and UI plumbing are available, but host applications must trigger suggestion generation and wire the followup state for suggestions to appear.
After Qwen Code finishes responding, a suggestion appears as dimmed text in the input area after a short delay (~300ms). For example, after fixing a bug, you might see:
> run the tests
The suggestion is generated by sending the conversation history to the model, which predicts what you would naturally type next. If the response contains an explicit tip (e.g., Tip: type post comments to publish findings), the suggested action is extracted automatically.
| Key | Action |
|---|---|
Tab | Accept the suggestion and fill it into the input |
Enter | Accept the suggestion and submit it immediately |
Right Arrow | Accept the suggestion and fill it into the input |
| Any typing | Dismiss the suggestion and type normally |
Suggestions are generated when all of the following conditions are met:
planSuggestions will not appear in non-interactive mode (e.g., headless/SDK mode).
Suggestions are automatically dismissed when:
By default, suggestions use the same model as your main conversation. For faster and cheaper suggestions, configure a dedicated fast model:
/model --fast qwen3-coder-flash
Or use /model --fast (without a model name) to open a selection dialog.
{
"fastModel": "qwen3-coder-flash"
}
The fast model is used for prompt suggestions and speculative execution. When not configured, the main conversation model is used as fallback.
Thinking/reasoning mode is automatically disabled for all background tasks (suggestion generation and speculation), regardless of your main model's thinking configuration. This avoids wasting tokens on internal reasoning that isn't needed for these tasks.
These settings can be configured in settings.json:
| Setting | Type | Default | Description |
|---|---|---|---|
ui.enableFollowupSuggestions | boolean | true | Enable or disable followup suggestions |
ui.enableCacheSharing | boolean | true | Use cache-aware forked queries to reduce cost (experimental) |
ui.enableSpeculation | boolean | false | Speculatively execute suggestions before submission (experimental) |
fastModel | string | "" | Model for prompt suggestions and speculative execution |
{
"fastModel": "qwen3-coder-flash",
"ui": {
"enableFollowupSuggestions": true,
"enableCacheSharing": true
}
}
Suggestion model usage appears in /stats output, showing tokens consumed by the fast model for suggestion generation.
The fast model is also shown in /about output under "Fast Model".
Suggestions go through quality filters to ensure they are useful:
/commit) are always allowed as single-word suggestions