docs/en/models/glm.mdx
Zhipu AI supports text chat, image understanding, speech-to-text (ASR), and embedding. A single zhipu_ai_api_key enables all capabilities.
{
"model": "glm-5.1",
"zhipu_ai_api_key": "YOUR_API_KEY"
}
| Parameter | Description |
|---|---|
model | Can be glm-5.1, glm-5-turbo, glm-5, glm-4.7, glm-4-plus, glm-4-flash, glm-4-air, etc. See model codes |
zhipu_ai_api_key | Create one in the Zhipu AI Console |
zhipu_ai_api_base | Optional, defaults to https://open.bigmodel.cn/api/paas/v4 |
Zhipu's chat models (glm-5.1, glm-5-turbo, etc.) do not support vision; vision calls are uniformly routed to glm-5v-turbo. Once zhipu_ai_api_key is configured, the Agent's Vision tool automatically uses this model, with no need to specify it explicitly in the configuration file.
{
"voice_to_text": "zhipu",
"voice_to_text_model": "glm-asr-2512"
}
| Parameter | Description |
|---|---|
voice_to_text | Set to zhipu to enable Zhipu ASR |
voice_to_text_model | Optional, defaults to glm-asr-2512 |
Credentials are automatically reused from zhipu_ai_api_key. Audio files should be smaller than 25MB; oversized files may be rejected by the server.
{
"embedding_provider": "zhipu",
"embedding_model": "embedding-3"
}
Available models: embedding-3, embedding-2. After changing the embedding, run /memory rebuild-index to rebuild the index.