z.ai (GLM) integration notes (Anthropic passthrough + MCP + usage)

Goal: integrate z.ai as an upstream provider into Antigravity’s proxy/service, primarily via an Anthropic API-compatible passthrough, and optionally provide “budget/usage” visibility and MCP helpers (search/reader/vision).

This is a working note capturing findings, constraints, and a proposed implementation path. It intentionally does not copy full upstream documentation; it extracts what matters for implementation and corner cases.

0) Product decisions / requirements (confirmed)

z.ai is configured and controlled from the API Proxy UI as an optional provider (enable/disable).
z.ai is used inside Antigravity (not via Google OAuth), but it must be able to serve API Proxy traffic alongside the existing account pool.
Storage: z.ai config/credentials are stored in the same data directory as the existing accounts and GUI config (same folder where Google account JSON lives). No Keychain/Vault.
Dispatch strategy is user-configurable:
- z.ai handles all proxy requests (exclusive mode), OR
- z.ai participates in the shared rotation/queue with other accounts and only gets requests when it is selected (pooled mode), OR
- (optional) fallback-only (only when the rest of the pool is unavailable).
z.ai MCP servers should be provided via Antigravity’s proxy as optional toggles (enable/disable) and be usable by apps without requiring users to configure z.ai keys in the apps.
Proxy authorization (if enabled) applies to the entire proxy (no per-route bypass).

1) Key docs / entry points

Anthropic-compatible endpoint (Coding plan usage with existing clients): https://docs.z.ai/devpack/tool/claude.md
Scenario example (same content, shorter): https://docs.z.ai/scenario-example/develop-tools/claude.md
Vision MCP (local stdio, Node): https://docs.z.ai/devpack/mcp/vision-mcp-server.md
Web Search MCP (remote HTTP/SSE): https://docs.z.ai/devpack/mcp/search-mcp-server.md
Web Reader MCP (remote HTTP/SSE): https://docs.z.ai/devpack/mcp/reader-mcp-server.md
API reference intro (general vs coding endpoint): https://docs.z.ai/api-reference/introduction.md
Chat completions (OpenAI-like): https://docs.z.ai/api-reference/llm/chat-completion.md
Usage query plugin (reveals monitor endpoints + auth quirks): https://docs.z.ai/devpack/extension/usage-query-plugin.md

Implementation status

Developer-facing implementation details for what is already built live in:

2) What z.ai provides (relevant to our integration)

2.1 Anthropic-compatible upstream (what we’ll passthrough to)

Docs show clients can be configured with:

ANTHROPIC_BASE_URL = https://api.z.ai/api/anthropic
ANTHROPIC_AUTH_TOKEN = <Z.AI API key>

This implies z.ai runs an Anthropic-compatible API surface behind that base URL.

Practical implication for Antigravity:

Add a new upstream provider “z.ai Anthropic” and forward /v1/* to https://api.z.ai/api/anthropic/v1/* (exact path joining must be verified via test calls).

2.2 Model mapping defaults (client-side)

Docs mention default mapping for “internal model env vars” to GLM:

ANTHROPIC_DEFAULT_OPUS_MODEL → glm-4.7
ANTHROPIC_DEFAULT_SONNET_MODEL → glm-4.7
ANTHROPIC_DEFAULT_HAIKU_MODEL → glm-4.5-air

Implication:

If a client requests “Claude” model names, z.ai may already translate them to GLM on their side OR the client may send GLM model names directly (depending on the client’s model mapping config).
For Antigravity, the simplest first step is “treat glm-* as z.ai” (or require explicit zai: prefix), and forward model strings unchanged.

2.3 OpenAI-like API (optional later)

z.ai also provides OpenAI-style chat completions under:

POST https://api.z.ai/api/paas/v4/chat/completions and a dedicated “coding endpoint”:
https://api.z.ai/api/coding/paas/v4 (doc note: use this for coding plan scenarios)

We can defer this for phase 2 if we want to stay strictly Anthropic passthrough.

3) MCP ecosystem (the “3 servers”)

Important: MCP is not part of the Anthropic /v1/messages request itself. MCP is configured by the client (or we expose local endpoints that behave like MCP servers).

3.1 Vision MCP server (local stdio process)

Doc highlights:

NPM package: @z_ai/mcp-server
Requires Node.js >= 22
Uses env vars:
- Z_AI_API_KEY (required)
- Z_AI_MODE=ZAI
Installed as a local stdio MCP server via an MCP-compatible client.

Implication:

This is a local process that clients spawn. Instead of requiring an extra runtime, the proxy now exposes a built-in Vision MCP endpoint (see docs/zai/vision-mcp.md), while still keeping compatibility with upstream behavior.

3.2 Web Search MCP server (remote)

Endpoints:

MCP over HTTP (recommended): https://api.z.ai/api/mcp/web_search_prime/mcp
- Header auth: Authorization: Bearer <api_key>
MCP over SSE (legacy/alternative): https://api.z.ai/api/mcp/web_search_prime/sse?Authorization=<api_key>

Quota note in docs (plan-dependent):

Lite/Pro/Max include a number of web searches/readers and vision resource pool.

3.3 Web Reader MCP server (remote)

Endpoints:

MCP over HTTP (recommended): https://api.z.ai/api/mcp/web_reader/mcp
- Header auth: Authorization: Bearer <api_key>
MCP over SSE (alternative): https://api.z.ai/api/mcp/web_reader/sse?Authorization=<api_key>

3.4 MCP-specific corner cases

Query-string auth for SSE is high risk (easy leakage into logs/history/screenshots). Prefer header-based auth.
If we proxy MCP endpoints locally, we should only expose header-based auth to the upstream and never include secrets in URLs.
Remote MCP endpoints may use “streamable-http” semantics; we should avoid buffering and proxy as a streaming response.

4) Usage / budget integration (tokens + MCP quotas)

There are “monitor/usage” endpoints used by z.ai’s usage query tooling. The reference script from zai-org/zai-coding-plugins uses:

GET /api/monitor/usage/model-usage?startTime=...&endTime=...
GET /api/monitor/usage/tool-usage?startTime=...&endTime=...
GET /api/monitor/usage/quota/limit based on ANTHROPIC_BASE_URL domain (if it contains api.z.ai).

Auth quirk:

The script sets Authorization: <token> (raw token) for these monitor endpoints (no Bearer).
Remote MCP endpoints use Authorization: Bearer <token>.
For z.ai’s general API reference (Bearer auth), it’s also Authorization: Bearer <ZAI_API_KEY>.

Implication:

We must treat these as separate integration surfaces:
- Anthropic-compatible upstream auth format (to be validated)
- Monitor endpoints auth format (raw token per script)
- MCP remote auth format (Bearer)

5) Our current proxy architecture constraints

Today the proxy’s “upstream” client is hardwired to Google v1internal and uses:

token pool (per-account OAuth tokens)
project_id resolution
retry/rotation logic

For z.ai passthrough we should bypass all Google-specific logic:

z.ai uses a single API key (or a key pool later), not OAuth refresh tokens.
project_id is not relevant.

Therefore phase 1 should introduce a provider-level router that can pick:

provider=google (existing flow)
provider=zai (passthrough flow)

6) Proposed implementation approach (phase 1 “minimal but real”)

6.1 New provider: z.ai Anthropic passthrough

Add config fields:
- zai.enabled
- zai.api_key (stored securely; never logged)
- zai.base_url (default https://api.z.ai/api/anthropic)
- optional: zai.request_timeout_ms
Routing:
- If resolved model starts with glm- OR mapping returns zai:<model> → use z.ai provider.
- Keep existing mappings intact; add support for values with a provider prefix (e.g. zai:glm-4.7).
Endpoint handling:
- Forward POST /v1/messages and other /v1/* requests by path passthrough.
- Streaming: proxy bytes end-to-end (do not parse/reshape SSE in phase 1).
- Error mapping: preserve upstream status/body; only wrap errors if needed for compatibility with current clients.

6.2 MCP (remote) local reverse-proxy (phase 1.5)

Provide local endpoints so clients do not store the z.ai key:

GET/POST /mcp/web_search_prime/mcp → upstream https://api.z.ai/api/mcp/web_search_prime/mcp
GET/POST /mcp/web_reader/mcp → upstream https://api.z.ai/api/mcp/web_reader/mcp

Behavior:

Require Antigravity’s local proxy auth (existing api_key) for access.
Inject upstream header: Authorization: Bearer <zai_api_key>.
Stream responses.

Explicitly avoid:

exposing SSE endpoints that require key-in-query.

6.3 Vision MCP (local stdio) “setup helper” only (phase 1)

Status update:

Vision MCP is implemented directly inside the proxy and exposed at /mcp/zai-mcp-server/mcp.
Implementation details: docs/zai/vision-mcp.md.

7) Corner cases checklist (must handle)

Auth header rewriting:
- never forward client-provided secrets upstream
- never log Authorization, x-api-key, cookies, tokens
Timeout differences:
- docs show API_TIMEOUT_MS set very high in some configs; we should allow per-provider timeout config
Streaming cancellation:
- client disconnect should abort upstream request
401/429:
- surface meaningful error and do not retry blindly
Model naming:
- support both glm-* and “Claude-like” names if needed (either client-side mapping or server-side mapping)
Mixed mode:
- if z.ai disabled or misconfigured, either fail fast or fallback to google (config-driven)
Monitor endpoints:
- auth format differences (raw vs Bearer); cache results; avoid rate limit
Security:
- never accept key in querystring for local endpoints
- if we provide “test connection”, ensure it does not leak the key in logs

8) Open questions to settle before coding

Which surface is the priority: Anthropic passthrough only, or also OpenAI-like chat/completions?
Do we want multi-key support for z.ai (rotation) or a single key per installation initially?
Fallback policy when z.ai quota is near/at limit: error vs automatic fallback to other provider.
Should the UI expose MCP helper endpoints/config snippets for clients?