Back to Chatgpt On Wechat

Tongyi Qwen

docs/en/models/qwen.mdx

2.0.93.1 KB
Original Source

Tongyi Qwen (DashScope / Bailian) is one of the most fully-featured vendors in China. Text, image understanding, image generation, speech-to-text, text-to-speech, and embedding can all be enabled with a single dashscope_api_key.

<Tip> All capabilities below can be configured in one place via the "Model Management" page in the Web Console, with no need to manually edit the configuration file. </Tip>

Text Chat

json
{
  "model": "qwen3.6-plus",
  "dashscope_api_key": "YOUR_API_KEY"
}
ParameterDescription
modelCan be qwen3.6-plus, qwen3.7-max, qwen3.5-plus, qwen3-max, qwen-max, qwen-plus, qwen-turbo, qwq-plus, etc.
dashscope_api_keyCreate one in the Bailian Console; see the official docs

Image Understanding

Once dashscope_api_key is configured, the Agent's Vision tool automatically calls Qwen's vision models to recognize images. Models like qwen3-max / qwen3.5-plus / qwen3.6-plus are already multimodal; if the main model is text-only (e.g. qwen-turbo), it automatically falls back to qwen-vl-max.

To manually specify a Vision model:

json
{
  "tools": {
    "vision": {
      "model": "qwen3.6-plus"
    }
  }
}

Supported models: qwen3.6-plus, qwen3.5-plus, qwen3-max.

Image Generation

json
{
  "skills": {
    "image-generation": {
      "model": "qwen-image-2.0"
    }
  }
}

Available models: qwen-image-2.0, qwen-image-2.0-pro.

Speech-to-Text (ASR)

json
{
  "voice_to_text": "dashscope",
  "voice_to_text_model": "qwen3-asr-flash"
}
ParameterDescription
voice_to_textSet to dashscope to enable Tongyi Qwen ASR
voice_to_text_modelOptional, defaults to qwen3-asr-flash

Credentials are automatically reused from dashscope_api_key. A single audio segment should be smaller than 10MB and no longer than 300 seconds.

Text-to-Speech (TTS)

json
{
  "text_to_voice": "dashscope",
  "text_to_voice_model": "qwen3-tts-flash",
  "tts_voice_id": "Cherry"
}
ParameterDescription
text_to_voice_modelOptional, defaults to qwen3-tts-flash; covers Mandarin, dialects, and major foreign languages
tts_voice_idVoice ID; see the common list below

Common voice examples:

Voice IDDescription
CherryQianyue · Sunny Female Voice
SerenaSuyao · Gentle Female Voice
EthanChenxu · Sunny Male Voice
ChelsieQianxue · Anime Girl
DylanBeijing Dialect · Xiaodong
RockyCantonese · Aqiang
SunnySichuan Dialect · Qing'er

The full voice list (Mandarin / regional dialects / bilingual, etc.) can be selected visually in the Web Console under "Model Management → Text-to-Speech".

Embedding

json
{
  "embedding_provider": "dashscope",
  "embedding_model": "text-embedding-v4"
}

The default model is text-embedding-v4. After changing the embedding, run /memory rebuild-index to rebuild the index.