docs/zai/notes.md
Goal: integrate z.ai as an upstream provider into Antigravity’s proxy/service, primarily via an Anthropic API-compatible passthrough, and optionally provide “budget/usage” visibility and MCP helpers (search/reader/vision).
This is a working note capturing findings, constraints, and a proposed implementation path. It intentionally does not copy full upstream documentation; it extracts what matters for implementation and corner cases.
https://docs.z.ai/devpack/tool/claude.mdhttps://docs.z.ai/scenario-example/develop-tools/claude.mdhttps://docs.z.ai/devpack/mcp/vision-mcp-server.mdhttps://docs.z.ai/devpack/mcp/search-mcp-server.mdhttps://docs.z.ai/devpack/mcp/reader-mcp-server.mdhttps://docs.z.ai/api-reference/introduction.mdhttps://docs.z.ai/api-reference/llm/chat-completion.mdhttps://docs.z.ai/devpack/extension/usage-query-plugin.mdDeveloper-facing implementation details for what is already built live in:
Docs show clients can be configured with:
ANTHROPIC_BASE_URL = https://api.z.ai/api/anthropicANTHROPIC_AUTH_TOKEN = <Z.AI API key>This implies z.ai runs an Anthropic-compatible API surface behind that base URL.
Practical implication for Antigravity:
/v1/* to https://api.z.ai/api/anthropic/v1/* (exact path joining must be verified via test calls).Docs mention default mapping for “internal model env vars” to GLM:
ANTHROPIC_DEFAULT_OPUS_MODEL → glm-4.7ANTHROPIC_DEFAULT_SONNET_MODEL → glm-4.7ANTHROPIC_DEFAULT_HAIKU_MODEL → glm-4.5-airImplication:
glm-* as z.ai” (or require explicit zai: prefix), and forward model strings unchanged.z.ai also provides OpenAI-style chat completions under:
POST https://api.z.ai/api/paas/v4/chat/completions
and a dedicated “coding endpoint”:https://api.z.ai/api/coding/paas/v4 (doc note: use this for coding plan scenarios)We can defer this for phase 2 if we want to stay strictly Anthropic passthrough.
Important: MCP is not part of the Anthropic /v1/messages request itself. MCP is configured by the client (or we expose local endpoints that behave like MCP servers).
Doc highlights:
@z_ai/mcp-server>= 22Z_AI_API_KEY (required)Z_AI_MODE=ZAIImplication:
docs/zai/vision-mcp.md), while still keeping compatibility with upstream behavior.Endpoints:
https://api.z.ai/api/mcp/web_search_prime/mcp
Authorization: Bearer <api_key>https://api.z.ai/api/mcp/web_search_prime/sse?Authorization=<api_key>Quota note in docs (plan-dependent):
Endpoints:
https://api.z.ai/api/mcp/web_reader/mcp
Authorization: Bearer <api_key>https://api.z.ai/api/mcp/web_reader/sse?Authorization=<api_key>There are “monitor/usage” endpoints used by z.ai’s usage query tooling.
The reference script from zai-org/zai-coding-plugins uses:
GET /api/monitor/usage/model-usage?startTime=...&endTime=...GET /api/monitor/usage/tool-usage?startTime=...&endTime=...GET /api/monitor/usage/quota/limit
based on ANTHROPIC_BASE_URL domain (if it contains api.z.ai).Auth quirk:
Authorization: <token> (raw token) for these monitor endpoints (no Bearer).Authorization: Bearer <token>.Authorization: Bearer <ZAI_API_KEY>.Implication:
Today the proxy’s “upstream” client is hardwired to Google v1internal and uses:
project_id resolutionFor z.ai passthrough we should bypass all Google-specific logic:
Therefore phase 1 should introduce a provider-level router that can pick:
provider=google (existing flow)provider=zai (passthrough flow)zai.enabledzai.api_key (stored securely; never logged)zai.base_url (default https://api.z.ai/api/anthropic)zai.request_timeout_msglm- OR mapping returns zai:<model> → use z.ai provider.zai:glm-4.7).POST /v1/messages and other /v1/* requests by path passthrough.Provide local endpoints so clients do not store the z.ai key:
GET/POST /mcp/web_search_prime/mcp → upstream https://api.z.ai/api/mcp/web_search_prime/mcpGET/POST /mcp/web_reader/mcp → upstream https://api.z.ai/api/mcp/web_reader/mcpBehavior:
api_key) for access.Authorization: Bearer <zai_api_key>.Explicitly avoid:
Status update:
/mcp/zai-mcp-server/mcp.docs/zai/vision-mcp.md.Authorization, x-api-key, cookies, tokensAPI_TIMEOUT_MS set very high in some configs; we should allow per-provider timeout configglm-* and “Claude-like” names if needed (either client-side mapping or server-side mapping)chat/completions?