Back to Chatgpt On Wechat

MiniMax

docs/en/models/minimax.mdx

2.0.92.3 KB
Original Source

MiniMax supports text chat, image understanding, image generation, and text-to-speech. A single minimax_api_key enables all capabilities.

<Tip> All capabilities below can be configured in one place via the "Model Management" page in the Web Console, with no need to manually edit the configuration file. </Tip>

Text Chat

json
{
  "model": "MiniMax-M2.7",
  "minimax_api_key": "YOUR_API_KEY"
}
ParameterDescription
modelCan be MiniMax-M2.7, MiniMax-M2.7-highspeed, MiniMax-M2.5, MiniMax-M2.1, MiniMax-M2.1-lightning, MiniMax-M2, etc.
minimax_api_keyCreate one in the MiniMax Console

Image Understanding

MiniMax's M2.x chat models do not support vision natively; vision calls are uniformly routed to MiniMax-Text-01. Once minimax_api_key is configured, the Agent's Vision tool automatically uses this model, with no need to specify it explicitly in the configuration file.

Image Generation

json
{
  "skills": {
    "image-generation": {
      "model": "image-01"
    }
  }
}

Available models: image-01.

Text-to-Speech (TTS)

json
{
  "text_to_voice": "minimax",
  "text_to_voice_model": "speech-2.8-hd",
  "tts_voice_id": "female-shaonv"
}
ParameterDescription
text_to_voice_modelspeech-2.8-hd (emotional rendering, natural sound), speech-2.8-turbo (ultra-fast), speech-2.6-hd, speech-2.6-turbo
tts_voice_idVoice ID; supports Chinese / Cantonese / English / Japanese / Korean — 70+ voices in total

Common voice examples:

Voice IDDescription
female-shaonvChinese · Young Girl (Female)
female-yujieChinese · Mature Lady (Female)
female-tianmeiChinese · Sweet Female (Female)
male-qn-jingyingChinese · Elite Youth (Male)
male-qn-badaoChinese · Dominant Youth (Male)
Cantonese_GentleLadyCantonese · Gentle Female Voice
English_Graceful_LadyEnglish · Graceful Lady

For the full voice list (70+ voices across Chinese / Cantonese / English / Japanese / Korean), see the system voice list, or select visually in the Web Console under "Model Management → Text-to-Speech".