Zhipu GLM - Chatgpt On Wechat

Zhipu AI supports text chat, image understanding, speech-to-text (ASR), and embedding. A single zhipu_ai_api_key enables all capabilities.

<Tip> All capabilities below can be configured in one place via the "Model Management" page in the Web Console, with no need to manually edit the configuration file. </Tip>

Text Chat

json

{
  "model": "glm-5.1",
  "zhipu_ai_api_key": "YOUR_API_KEY"
}

Parameter	Description
`model`	Can be `glm-5.1`, `glm-5-turbo`, `glm-5`, `glm-4.7`, `glm-4-plus`, `glm-4-flash`, `glm-4-air`, etc. See model codes
`zhipu_ai_api_key`	Create one in the Zhipu AI Console
`zhipu_ai_api_base`	Optional, defaults to `https://open.bigmodel.cn/api/paas/v4`

Image Understanding

Zhipu's chat models (glm-5.1, glm-5-turbo, etc.) do not support vision; vision calls are uniformly routed to glm-5v-turbo. Once zhipu_ai_api_key is configured, the Agent's Vision tool automatically uses this model, with no need to specify it explicitly in the configuration file.

Speech-to-Text (ASR)

json

{
  "voice_to_text": "zhipu",
  "voice_to_text_model": "glm-asr-2512"
}

Parameter	Description
`voice_to_text`	Set to `zhipu` to enable Zhipu ASR
`voice_to_text_model`	Optional, defaults to `glm-asr-2512`

Credentials are automatically reused from zhipu_ai_api_key. Audio files should be smaller than 25MB; oversized files may be rejected by the server.

Embedding

json

{
  "embedding_provider": "zhipu",
  "embedding_model": "embedding-3"
}

Available models: embedding-3, embedding-2. After changing the embedding, run /memory rebuild-index to rebuild the index.