docs/help/testing-live.md
For quick start, QA runners, unit/integration suites, and Docker flows, see Testing. This page covers the live (network-touching) test suites: model matrix, CLI backends, ACP, and media-provider live tests, plus credential handling.
Export the needed provider key in the process environment before ad hoc live checks.
Safe media smoke:
pnpm openclaw infer tts convert --local --json \
--text "OpenClaw live smoke." \
--output /tmp/openclaw-live-smoke.mp3
Safe voice-call readiness smoke:
pnpm openclaw voicecall setup --json
pnpm openclaw voicecall smoke --to "+15555550123"
voicecall smoke is a dry run unless --yes is also present. Use --yes only
when you intentionally want to place a real notify call. For Twilio, Telnyx, and
Plivo, a successful readiness check requires a public webhook URL; local-only
loopback/private fallbacks are rejected by design.
src/gateway/android-node.capabilities.live.test.tspnpm android:test:integrationnode.invoke validation for the selected Android node.OPENCLAW_ANDROID_NODE_ID or OPENCLAW_ANDROID_NODE_NAME.OPENCLAW_ANDROID_GATEWAY_URL / OPENCLAW_ANDROID_GATEWAY_TOKEN / OPENCLAW_ANDROID_GATEWAY_PASSWORD.Live tests are split into two layers so we can isolate failures:
src/agents/models.profiles.live.test.tsgetApiKeyForModel to select models you have creds forpnpm test:live (or OPENCLAW_LIVE_TEST=1 if invoking Vitest directly)OPENCLAW_LIVE_MODELS=modern (or all, alias for modern) to actually run this suite; otherwise it skips to keep pnpm test:live focused on gateway smokeOPENCLAW_LIVE_MODELS=modern to run the modern allowlist (Opus/Sonnet 4.6+, GPT-5.2 + Codex, Gemini 3, DeepSeek V4, GLM 4.7, MiniMax M2.7, Grok 4.3)OPENCLAW_LIVE_MODELS=all is an alias for the modern allowlistOPENCLAW_LIVE_MODELS="openai/gpt-5.5,openai-codex/gpt-5.5,anthropic/claude-opus-4-6,..." (comma allowlist)OPENCLAW_LIVE_MAX_MODELS=0 for an exhaustive modern sweep or a positive number for a smaller cap.OPENCLAW_LIVE_TEST_TIMEOUT_MS for the whole direct-model test timeout. Default: 60 minutes.OPENCLAW_LIVE_MODEL_CONCURRENCY to override.OPENCLAW_LIVE_PROVIDERS="google,google-antigravity,google-gemini-cli" (comma allowlist)OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 to enforce profile store onlysrc/gateway/gateway-models.profiles.live.test.tsagent:dev:* session (model override per run)read probe: the test writes a nonce file in the workspace and asks the agent to read it and echo the nonce back.exec+read probe: the test asks the agent to exec-write a nonce into a temp file, then read it back.cat <CODE>.src/gateway/gateway-models.profiles.live.test.ts and src/gateway/live-image-probe.ts.pnpm test:live (or OPENCLAW_LIVE_TEST=1 if invoking Vitest directly)OPENCLAW_LIVE_GATEWAY_MODELS=all is an alias for the modern allowlistOPENCLAW_LIVE_GATEWAY_MODELS="provider/model" (or comma list) to narrowOPENCLAW_LIVE_GATEWAY_MAX_MODELS=0 for an exhaustive modern sweep or a positive number for a smaller cap.OPENCLAW_LIVE_GATEWAY_PROVIDERS="google,google-antigravity,google-gemini-cli,openai,anthropic,zai,minimax" (comma allowlist)read probe + exec+read probe (tool stress)src/gateway/live-image-probe.ts)agent attachments: [{ mimeType: "image/png", content: "<base64>" }]images[] (src/gateway/server-methods/agent.ts + src/gateway/chat-attachments.ts)cat + the code (OCR tolerance: minor mistakes allowed)openclaw models list
openclaw models list --json
src/gateway/gateway-cli-backend.live.test.tscli-backend.ts definition.pnpm test:live (or OPENCLAW_LIVE_TEST=1 if invoking Vitest directly)OPENCLAW_LIVE_CLI_BACKEND=1claude-cli/claude-sonnet-4-6OPENCLAW_LIVE_CLI_BACKEND_MODEL="claude-cli/claude-sonnet-4-6"OPENCLAW_LIVE_CLI_BACKEND_COMMAND="/full/path/to/claude"OPENCLAW_LIVE_CLI_BACKEND_ARGS='["-p","--output-format","json"]'OPENCLAW_LIVE_CLI_BACKEND_IMAGE_PROBE=1 to send a real image attachment (paths are injected into the prompt). Docker recipes default this off unless explicitly requested.OPENCLAW_LIVE_CLI_BACKEND_IMAGE_ARG="--image" to pass image file paths as CLI args instead of prompt injection.OPENCLAW_LIVE_CLI_BACKEND_IMAGE_MODE="repeat" (or "list") to control how image args are passed when IMAGE_ARG is set.OPENCLAW_LIVE_CLI_BACKEND_RESUME_PROBE=1 to send a second turn and validate resume flow.OPENCLAW_LIVE_CLI_BACKEND_MODEL_SWITCH_PROBE=1 to opt into the Claude Sonnet -> Opus same-session continuity probe when the selected model supports a switch target. Docker recipes default this off for aggregate reliability.OPENCLAW_LIVE_CLI_BACKEND_MCP_PROBE=1 to opt into the MCP/tool loopback probe. Docker recipes default this off unless explicitly requested.Example:
OPENCLAW_LIVE_CLI_BACKEND=1 \
OPENCLAW_LIVE_CLI_BACKEND_MODEL="claude-cli/claude-sonnet-4-6" \
pnpm test:live src/gateway/gateway-cli-backend.live.test.ts
Cheap Gemini MCP config smoke:
OPENCLAW_LIVE_TEST=1 \
pnpm test:live src/agents/cli-runner/bundle-mcp.gemini.live.test.ts
This does not ask Gemini to generate a response. It writes the same system
settings OpenClaw gives Gemini, then runs gemini --debug mcp list to prove a
saved transport: "streamable-http" server is normalized to Gemini's HTTP MCP
shape and can connect to a local streamable-HTTP MCP server.
Docker recipe:
pnpm test:docker:live-cli-backend
Single-provider Docker recipes:
pnpm test:docker:live-cli-backend:claude
pnpm test:docker:live-cli-backend:claude-subscription
pnpm test:docker:live-cli-backend:gemini
Notes:
scripts/test-live-cli-backend-docker.sh.node user.@anthropic-ai/claude-code or @google/gemini-cli) into a cached writable prefix at OPENCLAW_DOCKER_CLI_TOOLS_DIR (default: ~/.cache/openclaw/docker-cli-tools).pnpm test:docker:live-cli-backend:claude-subscription requires portable Claude Code subscription OAuth through either ~/.claude/.credentials.json with claudeAiOauth.subscriptionType or CLAUDE_CODE_OAUTH_TOKEN from claude setup-token. It first proves direct claude -p in Docker, then runs two Gateway CLI-backend turns without preserving Anthropic API-key env vars. This subscription lane disables the Claude MCP/tool and image probes by default because Claude currently routes third-party app usage through extra-usage billing instead of normal subscription plan limits.cron tool call verified through the gateway CLI.src/infra/push-apns-http2.live.test.ts403 InvalidProviderToken response comes back through the proxy path.OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_APNS_REACHABILITY=1 pnpm test:live src/infra/push-apns-http2.live.test.tsOPENCLAW_LIVE_APNS_TIMEOUT_MS=30000/acp spawn ... --bind here)src/gateway/gateway-acp-bind.live.test.ts/acp spawn <agent> --bind herepnpm test:live src/gateway/gateway-acp-bind.live.test.tsOPENCLAW_LIVE_ACP_BIND=1claude,codex,geminipnpm test:live ...: claudeacpxOPENCLAW_LIVE_ACP_BIND_AGENT=claudeOPENCLAW_LIVE_ACP_BIND_AGENT=codexOPENCLAW_LIVE_ACP_BIND_AGENT=droidOPENCLAW_LIVE_ACP_BIND_AGENT=geminiOPENCLAW_LIVE_ACP_BIND_AGENT=opencodeOPENCLAW_LIVE_ACP_BIND_AGENTS=claude,codex,geminiOPENCLAW_LIVE_ACP_BIND_AGENT_COMMAND='npx -y @agentclientprotocol/claude-agent-acp@<version>'OPENCLAW_LIVE_ACP_BIND_CODEX_MODEL=gpt-5.5OPENCLAW_LIVE_ACP_BIND_OPENCODE_MODEL=opencode/kimi-k2.6OPENCLAW_LIVE_ACP_BIND_REQUIRE_TRANSCRIPT=1OPENCLAW_LIVE_ACP_BIND_REQUIRE_CRON=1OPENCLAW_LIVE_ACP_BIND_PARENT_MODEL=openai/gpt-5.5chat.send surface with admin-only synthetic originating-route fields so tests can attach message-channel context without pretending to deliver externally.OPENCLAW_LIVE_ACP_BIND_AGENT_COMMAND is unset, the test uses the embedded acpx plugin's built-in agent registry for the selected ACP harness agent.OPENCLAW_LIVE_ACP_BIND_REQUIRE_CRON=1 to make that post-bind cron probe strict.Example:
OPENCLAW_LIVE_ACP_BIND=1 \
OPENCLAW_LIVE_ACP_BIND_AGENT=claude \
pnpm test:live src/gateway/gateway-acp-bind.live.test.ts
Docker recipe:
pnpm test:docker:live-acp-bind
Single-agent Docker recipes:
pnpm test:docker:live-acp-bind:claude
pnpm test:docker:live-acp-bind:codex
pnpm test:docker:live-acp-bind:droid
pnpm test:docker:live-acp-bind:gemini
pnpm test:docker:live-acp-bind:opencode
Docker notes:
scripts/test-live-acp-bind-docker.sh.claude, codex, then gemini.OPENCLAW_LIVE_ACP_BIND_AGENTS=claude, OPENCLAW_LIVE_ACP_BIND_AGENTS=codex, OPENCLAW_LIVE_ACP_BIND_AGENTS=droid, OPENCLAW_LIVE_ACP_BIND_AGENTS=gemini, or OPENCLAW_LIVE_ACP_BIND_AGENTS=opencode to narrow the matrix.@anthropic-ai/claude-code, @openai/codex, Factory Droid via https://app.factory.ai/cli, @google/gemini-cli, or opencode-ai) if missing. The ACP backend itself is the embedded acpx/runtime package from the official acpx plugin.~/.factory for settings, forwards FACTORY_API_KEY, and requires that API key because local Factory OAuth/keyring auth is not portable into the container. It uses ACPX's built-in droid exec --output-format acp registry entry.OPENCODE_CONFIG_CONTENT default model from OPENCLAW_LIVE_ACP_BIND_OPENCODE_MODEL (default opencode/kimi-k2.6), and pnpm test:docker:live-acp-bind:opencode requires a bound assistant transcript instead of accepting the generic post-bind skip.acpx CLI calls are only a manual/workaround path for comparing behavior outside the Gateway. The Docker ACP bind smoke exercises OpenClaw's embedded acpx runtime backend.agent method:
codex pluginopenai/gpt-5.5, which routes OpenAI agent turns through Codex by defaultopenai/gpt-5.5 with the Codex harness selected/codex status and /codex models through the same gateway command
pathsrc/gateway/gateway-codex-harness.live.test.tsOPENCLAW_LIVE_CODEX_HARNESS=1openai/gpt-5.5OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=1OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=1OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=1agentRuntime.id: "codex" so a broken Codex
harness cannot pass by silently falling back to PI.OPENAI_API_KEY for non-Codex probes when applicable,
plus optional copied ~/.codex/auth.json and ~/.codex/config.toml.Local recipe:
OPENCLAW_LIVE_CODEX_HARNESS=1 \
OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=1 \
OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=1 \
OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=1 \
OPENCLAW_LIVE_CODEX_HARNESS_MODEL=openai/gpt-5.5 \
pnpm test:live -- src/gateway/gateway-codex-harness.live.test.ts
Docker recipe:
pnpm test:docker:live-codex-harness
Docker notes:
scripts/test-live-codex-harness-docker.sh.OPENAI_API_KEY, copies Codex CLI auth files when present, installs
@openai/codex into a writable mounted npm
prefix, stages the source tree, then runs only the Codex-harness live test.OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=0 or
OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=0 or
OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=0 when you need a narrower debug
run.Narrow, explicit allowlists are fastest and least flaky:
Single model, direct (no gateway):
OPENCLAW_LIVE_MODELS="openai/gpt-5.5" pnpm test:live src/agents/models.profiles.live.test.tsSingle model, gateway smoke:
OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.5" pnpm test:live src/gateway/gateway-models.profiles.live.test.tsTool calling across several providers:
OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.5,openai-codex/gpt-5.5,anthropic/claude-opus-4-6,google/gemini-3-flash-preview,deepseek/deepseek-v4-flash,zai/glm-5.1,minimax/MiniMax-M2.7" pnpm test:live src/gateway/gateway-models.profiles.live.test.tsGoogle focus (Gemini API key + Antigravity):
OPENCLAW_LIVE_GATEWAY_MODELS="google/gemini-3-flash-preview" pnpm test:live src/gateway/gateway-models.profiles.live.test.tsOPENCLAW_LIVE_GATEWAY_MODELS="google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-pro-high" pnpm test:live src/gateway/gateway-models.profiles.live.test.tsGoogle adaptive thinking smoke:
pnpm openclaw qa manual --provider-mode live-frontier --model google/gemini-3.1-pro-preview --alt-model google/gemini-3.1-pro-preview --message '/think adaptive Reply exactly: GEMINI_ADAPTIVE_OK' --timeout-ms 180000pnpm openclaw qa manual --provider-mode live-frontier --model google/gemini-2.5-flash --alt-model google/gemini-2.5-flash --message '/think adaptive Reply exactly: GEMINI25_ADAPTIVE_OK' --timeout-ms 180000Notes:
google/... uses the Gemini API (API key).google-antigravity/... uses the Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint).google-gemini-cli/... uses the local Gemini CLI on your machine (separate auth + tooling quirks).gemini binary; it has its own auth and can behave differently (streaming/tool support/version skew).There is no fixed "CI model list" (live is opt-in), but these are the recommended models to cover regularly on a dev machine with keys.
This is the "common models" run we expect to keep working:
openai/gpt-5.5openai-codex/gpt-5.5anthropic/claude-opus-4-6 (or anthropic/claude-sonnet-4-6)google/gemini-3.1-pro-preview and google/gemini-3-flash-preview (avoid older Gemini 2.x models)google-antigravity/claude-opus-4-6-thinking and google-antigravity/gemini-3-flashdeepseek/deepseek-v4-flash and deepseek/deepseek-v4-prozai/glm-5.1minimax/MiniMax-M2.7Run gateway smoke with tools + image:
OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.5,openai-codex/gpt-5.5,anthropic/claude-opus-4-6,google/gemini-3.1-pro-preview,google/gemini-3-flash-preview,google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-flash,deepseek/deepseek-v4-flash,zai/glm-5.1,minimax/MiniMax-M2.7" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
Pick at least one per provider family:
openai/gpt-5.5anthropic/claude-opus-4-6 (or anthropic/claude-sonnet-4-6)google/gemini-3-flash-preview (or google/gemini-3.1-pro-preview)deepseek/deepseek-v4-flashzai/glm-5.1minimax/MiniMax-M2.7Optional additional coverage (nice to have):
xai/grok-4.3 (or latest available)mistral/… (pick one "tools" capable model you have enabled)cerebras/… (if you have access)lmstudio/… (local; tool calling depends on API mode)Include at least one image-capable model in OPENCLAW_LIVE_GATEWAY_MODELS (Claude/Gemini/OpenAI vision-capable variants, etc.) to exercise the image probe.
If you have keys enabled, we also support testing via:
openrouter/... (hundreds of models; use openclaw models scan to find tool+image capable candidates)opencode/... for Zen and opencode-go/... for Go (auth via OPENCODE_API_KEY / OPENCODE_ZEN_API_KEY)More providers you can include in the live matrix (if you have creds/config):
openai, openai-codex, anthropic, google, google-vertex, google-antigravity, google-gemini-cli, zai, openrouter, opencode, opencode-go, xai, groq, cerebras, mistral, github-copilotmodels.providers (custom endpoints): minimax (cloud/API), plus any OpenAI/Anthropic-compatible proxy (LM Studio, vLLM, LiteLLM, etc.)Live tests discover credentials the same way the CLI does. Practical implications:
If the CLI works, live tests should find the same keys.
If a live test says "no creds", debug the same way you'd debug openclaw models list / model selection.
Per-agent auth profiles: ~/.openclaw/agents/<agentId>/agent/auth-profiles.json (this is what "profile keys" means in the live tests)
Config: ~/.openclaw/openclaw.json (or OPENCLAW_CONFIG_PATH)
Legacy state dir: ~/.openclaw/credentials/ (copied into the staged live home when present, but not the main profile-key store)
Live local runs copy the active config, per-agent auth-profiles.json files, legacy credentials/, and supported external CLI auth dirs into a temp test home by default; staged live homes skip workspace/ and sandboxes/, and agents.*.workspace / agentDir path overrides are stripped so probes stay off your real host workspace.
If you want to rely on env keys, export them before local tests or use the
Docker runners below with an explicit OPENCLAW_PROFILE_FILE.
extensions/deepgram/audio.live.test.tsDEEPGRAM_API_KEY=... DEEPGRAM_LIVE_TEST=1 pnpm test:live extensions/deepgram/audio.live.test.tsextensions/byteplus/live.test.tsBYTEPLUS_API_KEY=... BYTEPLUS_LIVE_TEST=1 pnpm test:live extensions/byteplus/live.test.tsBYTEPLUS_CODING_MODEL=ark-code-latestextensions/comfy/comfy.live.test.tsOPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.tsmusic_generate pathsplugins.entries.comfy.config.<capability> is configuredtest/image-generation.runtime.live.test.tspnpm test:live test/image-generation.runtime.live.test.tspnpm test:live:media imageauth-profiles.json do not mask real shell credentials<provider>:generate<provider>:edit when the provider declares edit supportdeepinfrafalgoogleminimaxopenaiopenroutervydraxaiOPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="openai,google,openrouter,xai"OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="deepinfra"OPENCLAW_LIVE_IMAGE_GENERATION_MODELS="openai/gpt-image-2,google/gemini-3.1-flash-image-preview,openrouter/google/gemini-3.1-flash-image-preview,xai/grok-imagine-image"OPENCLAW_LIVE_IMAGE_GENERATION_CASES="google:flash-generate,google:pro-edit,openrouter:generate,xai:default-generate,xai:default-edit"OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 to force profile-store auth and ignore env-only overridesFor the shipped CLI path, add an infer smoke after the provider/runtime live
test passes:
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_INFER_CLI_TEST=1 pnpm test:live -- test/image-generation.infer-cli.live.test.ts
openclaw infer image providers --json
openclaw infer image generate \
--model google/gemini-3.1-flash-image-preview \
--prompt "Minimal flat test image: one blue square on a white background, no text." \
--output ./openclaw-infer-image-smoke.png \
--json
This covers CLI argument parsing, config/default-agent resolution, bundled plugin activation, the shared image-generation runtime, and the live provider request. Plugin dependencies are expected to be present before runtime load.
extensions/music-generation-providers.live.test.tsOPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.tspnpm test:live:media musicauth-profiles.json do not mask real shell credentialsgenerate with prompt-only inputedit when the provider declares capabilities.edit.enabledgoogle: generate, editminimax: generatecomfy: separate Comfy live file, not this shared sweepOPENCLAW_LIVE_MUSIC_GENERATION_PROVIDERS="google,minimax"OPENCLAW_LIVE_MUSIC_GENERATION_MODELS="google/lyria-3-clip-preview,minimax/music-2.6"OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 to force profile-store auth and ignore env-only overridesextensions/video-generation-providers.live.test.tsOPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.tspnpm test:live:media videoOPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS (180000 by default)--video-providers fal or OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="fal" to run it explicitlyauth-profiles.json do not mask real shell credentialsgenerate by defaultOPENCLAW_LIVE_VIDEO_GENERATION_FULL_MODES=1 to also run declared transform modes when available:
imageToVideo when the provider declares capabilities.imageToVideo.enabled and the selected provider/model accepts buffer-backed local image input in the shared sweepvideoToVideo when the provider declares capabilities.videoToVideo.enabled and the selected provider/model accepts buffer-backed local video input in the shared sweepimageToVideo providers in the shared sweep:
vydra because bundled veo3 is text-only and bundled kling requires a remote image URLOPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_VYDRA_VIDEO=1 pnpm test:live -- extensions/vydra/vydra.live.test.tsveo3 text-to-video plus a kling lane that uses a remote image URL fixture by defaultvideoToVideo live coverage:
runway only when the selected model is runway/gen4_alephvideoToVideo providers in the shared sweep:
alibaba, qwen, xai because those paths currently require remote http(s) / MP4 reference URLsgoogle because the current shared Gemini/Veo lane uses local buffer-backed input and that path is not accepted in the shared sweepopenai because the current shared lane lacks org-specific video inpaint/remix access guaranteesOPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="deepinfra,google,openai,runway"OPENCLAW_LIVE_VIDEO_GENERATION_MODELS="google/veo-3.1-fast-generate-preview,openai/sora-2,runway/gen4_aleph"OPENCLAW_LIVE_VIDEO_GENERATION_SKIP_PROVIDERS="" to include every provider in the default sweep, including FALOPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS=60000 to reduce each provider operation cap for an aggressive smoke runOPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 to force profile-store auth and ignore env-only overridespnpm test:live:mediascripts/test-live.mjs, so heartbeat and quiet-mode behavior stay consistentpnpm test:live:mediapnpm test:live:media image video --providers openai,google,minimaxpnpm test:live:media video --video-providers openai,runway --all-providerspnpm test:live:media music --quiet