Followup Suggestions

Qwen Code can predict what you want to type next and show it as placeholder text in the input area. This feature uses an LLM call to analyze the conversation context and generate a natural next step suggestion.

This feature works end-to-end in the CLI. In the WebUI, the hook and UI plumbing are available, but host applications must trigger suggestion generation and wire the followup state for suggestions to appear.

How It Works

After Qwen Code finishes responding, a suggestion appears as dimmed placeholder text in the input area after a short delay (~300ms). For example, after fixing a bug, you might see:

> run the tests

The suggestion is generated by sending the conversation history to the model, which predicts what you would naturally type next. If the response contains an explicit tip (e.g., Tip: type post comments to publish findings), the suggested action is extracted automatically.

Accepting Suggestions

Key	Action
`Tab`	Accept the suggestion and fill it into the input
`Enter`	Accept the suggestion and fill it into the input
`Right Arrow`	Accept the suggestion and fill it into the input
Any typing	Dismiss the suggestion and type normally

Enter fills the input rather than submitting, so accepting a suggested slash command (e.g. /clear) never auto-executes — you submit it yourself with a second Enter.

When Suggestions Appear

Suggestions are generated when all of the following conditions are met:

The model has completed its response (not during streaming)
At least 2 model turns have occurred in the conversation
There are no errors in the most recent response
No confirmation dialogs are pending (e.g., shell confirmation, permissions)
The approval mode is not set to plan
The feature is enabled (on by default — set ui.enableFollowupSuggestions to false to turn it off)

Suggestions will not appear in non-interactive mode (e.g., headless/SDK mode).

Suggestions are automatically dismissed when:

You start typing
A new model turn begins
The suggestion is accepted

Fast Model

By default, suggestions use the same model as your main conversation. For lower-latency suggestions, configure a dedicated fast model:

Via command

/model --fast qwen3-coder-flash

Or use /model --fast (without a model name) to open a selection dialog.

Via settings.json

json

{
  "fastModel": "qwen3-coder-flash"
}

The fast model is used for prompt suggestions and speculative execution. When not configured, the main conversation model is used as fallback.

Cost note: A fast model lowers latency, but it does not always lower cost. Suggestion generation reuses your conversation's prefix cache (via ui.enableCacheSharing, on by default) — but a prefix cache is per-model. Pointing fastModel at a different model forks to a separate cache, so the whole conversation history is re-billed as uncached input on the fast model. On long conversations, the default (main model + shared cache) can be cheaper than a fast model, since most of the history is billed at the discounted cached rate. Set fastModel when latency matters more than per-turn cost.

Thinking/reasoning mode is automatically disabled for all background tasks (suggestion generation and speculation), regardless of your main model's thinking configuration. This avoids wasting tokens on internal reasoning that isn't needed for these tasks.

Configuration

These settings can be configured in settings.json:

Setting	Type	Default	Description
`ui.enableFollowupSuggestions`	boolean	`true`	Enable or disable followup suggestions
`ui.enableCacheSharing`	boolean	`true`	Use cache-aware forked queries to reduce cost (experimental)
`ui.enableSpeculation`	boolean	`false`	Speculatively execute suggestions before submission (experimental)
`fastModel`	string	`""`	Model for prompt suggestions and speculative execution

Example

json

{
  "fastModel": "qwen3-coder-flash",
  "ui": {
    "enableFollowupSuggestions": true,
    "enableCacheSharing": true
  }
}

Monitoring

Suggestion model usage appears in /stats output, showing tokens consumed by the fast model for suggestion generation.

The fast model is also shown in /about output under "Fast Model".

Suggestion Quality

Suggestions go through quality filters to ensure they are useful:

Must be 2-12 words (CJK: 2-30 characters), under 100 characters total
Cannot be evaluative ("looks good", "thanks")
Cannot use AI voice ("Let me...", "I'll...")
Cannot be multiple sentences or contain formatting (markdown, newlines)
Cannot be meta-commentary ("nothing to suggest", "silence")
Cannot be error messages or prefixed labels ("Suggestion: ...")
Single-word suggestions are only allowed for common commands (yes, commit, push, etc.)
Slash commands (e.g., /commit) are always allowed as single-word suggestions