docs/zai/vision-mcp.md
The upstream Vision MCP package (@z_ai/mcp-server) is designed as a local stdio server. In a desktop app + embedded proxy, requiring users (or the app) to manage a separate Node runtime/process increases operational complexity.
Instead, we implement a built-in Vision MCP server directly in the proxy:
/mcp/zai-mcp-server/mcpWired in:
src-tauri/src/proxy/server.rs (router)Handler:
src-tauri/src/proxy/handlers/mcp.rs (handle_zai_mcp_server)Implemented methods:
POST /mcp:
initializetools/listtools/callGET /mcp:
DELETE /mcp:
Session storage:
Notes:
Tool registry:
tool_specs() in src-tauri/src/proxy/zai_vision_tools.rsTool execution:
call_tool(...) in src-tauri/src/proxy/zai_vision_tools.rsSupported tools (mirrors the upstream package at a high level):
ui_to_artifactextract_text_from_screenshotdiagnose_error_screenshotunderstand_technical_diagramanalyze_data_visualizationui_diff_checkanalyze_imageanalyze_videoVision tools call the z.ai vision chat completions endpoint:
https://api.z.ai/api/paas/v4/chat/completionsImplementation:
vision_chat_completion(...) in src-tauri/src/proxy/zai_vision_tools.rsAuth:
Authorization: Bearer <proxy.zai.api_key>Payload:
model: glm-4.6v (currently hardcoded)messages: system prompt + a multimodal user message containing images/videos + text promptstream: false (currently returns a single tool result)To support local file paths passed by MCP clients:
.png, .jpg, .jpeg) are read and encoded as data:<mime>;base64,... (5 MB max).mp4, .mov, .m4v) are read and encoded as data:<mime>;base64,... (8 MB max)Implementation:
image_source_to_content(...) in src-tauri/src/proxy/zai_vision_tools.rsvideo_source_to_content(...) in src-tauri/src/proxy/zai_vision_tools.rsPOST /mcp/zai-mcp-server/mcp with {\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"initialize\",\"params\":{\"protocolVersion\":\"2024-11-05\",\"capabilities\":{}}}Mcp-Session-Id response headerPOST /mcp/zai-mcp-server/mcp with Mcp-Session-Id: <id> and {\"jsonrpc\":\"2.0\",\"id\":2,\"method\":\"tools/list\"}POST /mcp/zai-mcp-server/mcp with Mcp-Session-Id: <id> and {\"jsonrpc\":\"2.0\",\"id\":3,\"method\":\"tools/call\",\"params\":{\"name\":\"analyze_image\",\"arguments\":{\"image_source\":\"/path/to/file.png\",\"prompt\":\"Describe this image\"}}}