docs/en/models/qwen.mdx
Tongyi Qwen (DashScope / Bailian) is one of the most fully-featured vendors in China. Text, image understanding, image generation, speech-to-text, text-to-speech, and embedding can all be enabled with a single dashscope_api_key.
{
"model": "qwen3.6-plus",
"dashscope_api_key": "YOUR_API_KEY"
}
| Parameter | Description |
|---|---|
model | Can be qwen3.6-plus, qwen3.7-max, qwen3.5-plus, qwen3-max, qwen-max, qwen-plus, qwen-turbo, qwq-plus, etc. |
dashscope_api_key | Create one in the Bailian Console; see the official docs |
Once dashscope_api_key is configured, the Agent's Vision tool automatically calls Qwen's vision models to recognize images. Models like qwen3-max / qwen3.5-plus / qwen3.6-plus are already multimodal; if the main model is text-only (e.g. qwen-turbo), it automatically falls back to qwen-vl-max.
To manually specify a Vision model:
{
"tools": {
"vision": {
"model": "qwen3.6-plus"
}
}
}
Supported models: qwen3.6-plus, qwen3.5-plus, qwen3-max.
{
"skills": {
"image-generation": {
"model": "qwen-image-2.0"
}
}
}
Available models: qwen-image-2.0, qwen-image-2.0-pro.
{
"voice_to_text": "dashscope",
"voice_to_text_model": "qwen3-asr-flash"
}
| Parameter | Description |
|---|---|
voice_to_text | Set to dashscope to enable Tongyi Qwen ASR |
voice_to_text_model | Optional, defaults to qwen3-asr-flash |
Credentials are automatically reused from dashscope_api_key. A single audio segment should be smaller than 10MB and no longer than 300 seconds.
{
"text_to_voice": "dashscope",
"text_to_voice_model": "qwen3-tts-flash",
"tts_voice_id": "Cherry"
}
| Parameter | Description |
|---|---|
text_to_voice_model | Optional, defaults to qwen3-tts-flash; covers Mandarin, dialects, and major foreign languages |
tts_voice_id | Voice ID; see the common list below |
Common voice examples:
| Voice ID | Description |
|---|---|
Cherry | Qianyue · Sunny Female Voice |
Serena | Suyao · Gentle Female Voice |
Ethan | Chenxu · Sunny Male Voice |
Chelsie | Qianxue · Anime Girl |
Dylan | Beijing Dialect · Xiaodong |
Rocky | Cantonese · Aqiang |
Sunny | Sichuan Dialect · Qing'er |
The full voice list (Mandarin / regional dialects / bilingual, etc.) can be selected visually in the Web Console under "Model Management → Text-to-Speech".
{
"embedding_provider": "dashscope",
"embedding_model": "text-embedding-v4"
}
The default model is text-embedding-v4. After changing the embedding, run /memory rebuild-index to rebuild the index.