docs/help/testing-live.md
For quick start, QA runners, unit/integration suites, and Docker flows, see Testing. This page covers the live (network-touching) test suites: model matrix, CLI backends, ACP, and media-provider live tests, plus credential handling.
Source ~/.profile before ad hoc live checks so provider keys and local tool
paths match your shell:
source ~/.profile
Safe media smoke:
pnpm openclaw infer tts convert --local --json \
--text "OpenClaw live smoke." \
--output /tmp/openclaw-live-smoke.mp3
Safe voice-call readiness smoke:
pnpm openclaw voicecall setup --json
pnpm openclaw voicecall smoke --to "+15555550123"
voicecall smoke is a dry run unless --yes is also present. Use --yes only
when you intentionally want to place a real notify call. For Twilio, Telnyx, and
Plivo, a successful readiness check requires a public webhook URL; local-only
loopback/private fallbacks are rejected by design.
src/gateway/android-node.capabilities.live.test.tspnpm android:test:integrationnode.invoke validation for the selected Android node.OPENCLAW_ANDROID_NODE_ID or OPENCLAW_ANDROID_NODE_NAME.OPENCLAW_ANDROID_GATEWAY_URL / OPENCLAW_ANDROID_GATEWAY_TOKEN / OPENCLAW_ANDROID_GATEWAY_PASSWORD.Live tests are split into two layers so we can isolate failures:
src/agents/models.profiles.live.test.tsgetApiKeyForModel to select models you have creds forpnpm test:live (or OPENCLAW_LIVE_TEST=1 if invoking Vitest directly)OPENCLAW_LIVE_MODELS=modern (or all, alias for modern) to actually run this suite; otherwise it skips to keep pnpm test:live focused on gateway smokeOPENCLAW_LIVE_MODELS=modern to run the modern allowlist (Opus/Sonnet 4.6+, GPT-5.2 + Codex, Gemini 3, DeepSeek V4, GLM 4.7, MiniMax M2.7, Grok 4.3)OPENCLAW_LIVE_MODELS=all is an alias for the modern allowlistOPENCLAW_LIVE_MODELS="openai/gpt-5.5,openai-codex/gpt-5.5,anthropic/claude-opus-4-6,..." (comma allowlist)OPENCLAW_LIVE_MAX_MODELS=0 for an exhaustive modern sweep or a positive number for a smaller cap.OPENCLAW_LIVE_TEST_TIMEOUT_MS for the whole direct-model test timeout. Default: 60 minutes.OPENCLAW_LIVE_MODEL_CONCURRENCY to override.OPENCLAW_LIVE_PROVIDERS="google,google-antigravity,google-gemini-cli" (comma allowlist)OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 to enforce profile store onlysrc/gateway/gateway-models.profiles.live.test.tsagent:dev:* session (model override per run)read probe: the test writes a nonce file in the workspace and asks the agent to read it and echo the nonce back.exec+read probe: the test asks the agent to exec-write a nonce into a temp file, then read it back.cat <CODE>.src/gateway/gateway-models.profiles.live.test.ts and src/gateway/live-image-probe.ts.pnpm test:live (or OPENCLAW_LIVE_TEST=1 if invoking Vitest directly)OPENCLAW_LIVE_GATEWAY_MODELS=all is an alias for the modern allowlistOPENCLAW_LIVE_GATEWAY_MODELS="provider/model" (or comma list) to narrowOPENCLAW_LIVE_GATEWAY_MAX_MODELS=0 for an exhaustive modern sweep or a positive number for a smaller cap.OPENCLAW_LIVE_GATEWAY_PROVIDERS="google,google-antigravity,google-gemini-cli,openai,anthropic,zai,minimax" (comma allowlist)read probe + exec+read probe (tool stress)src/gateway/live-image-probe.ts)agent attachments: [{ mimeType: "image/png", content: "<base64>" }]images[] (src/gateway/server-methods/agent.ts + src/gateway/chat-attachments.ts)cat + the code (OCR tolerance: minor mistakes allowed)openclaw models list
openclaw models list --json
src/gateway/gateway-cli-backend.live.test.tscli-backend.ts definition.pnpm test:live (or OPENCLAW_LIVE_TEST=1 if invoking Vitest directly)OPENCLAW_LIVE_CLI_BACKEND=1claude-cli/claude-sonnet-4-6OPENCLAW_LIVE_CLI_BACKEND_MODEL="codex-cli/gpt-5.5"OPENCLAW_LIVE_CLI_BACKEND_COMMAND="/full/path/to/codex"OPENCLAW_LIVE_CLI_BACKEND_ARGS='["exec","--json","--color","never","--sandbox","read-only","--skip-git-repo-check"]'OPENCLAW_LIVE_CLI_BACKEND_IMAGE_PROBE=1 to send a real image attachment (paths are injected into the prompt). Docker recipes default this off unless explicitly requested.OPENCLAW_LIVE_CLI_BACKEND_IMAGE_ARG="--image" to pass image file paths as CLI args instead of prompt injection.OPENCLAW_LIVE_CLI_BACKEND_IMAGE_MODE="repeat" (or "list") to control how image args are passed when IMAGE_ARG is set.OPENCLAW_LIVE_CLI_BACKEND_RESUME_PROBE=1 to send a second turn and validate resume flow.OPENCLAW_LIVE_CLI_BACKEND_MODEL_SWITCH_PROBE=1 to opt into the Claude Sonnet -> Opus same-session continuity probe when the selected model supports a switch target. Docker recipes default this off for aggregate reliability.OPENCLAW_LIVE_CLI_BACKEND_MCP_PROBE=1 to opt into the MCP/tool loopback probe. Docker recipes default this off unless explicitly requested.Example:
OPENCLAW_LIVE_CLI_BACKEND=1 \
OPENCLAW_LIVE_CLI_BACKEND_MODEL="codex-cli/gpt-5.5" \
pnpm test:live src/gateway/gateway-cli-backend.live.test.ts
Cheap Gemini MCP config smoke:
OPENCLAW_LIVE_TEST=1 \
pnpm test:live src/agents/cli-runner/bundle-mcp.gemini.live.test.ts
This does not ask Gemini to generate a response. It writes the same system
settings OpenClaw gives Gemini, then runs gemini --debug mcp list to prove a
saved transport: "streamable-http" server is normalized to Gemini's HTTP MCP
shape and can connect to a local streamable-HTTP MCP server.
Docker recipe:
pnpm test:docker:live-cli-backend
Single-provider Docker recipes:
pnpm test:docker:live-cli-backend:claude
pnpm test:docker:live-cli-backend:claude-subscription
pnpm test:docker:live-cli-backend:codex
pnpm test:docker:live-cli-backend:gemini
Notes:
scripts/test-live-cli-backend-docker.sh.node user.@anthropic-ai/claude-code, @openai/codex, or @google/gemini-cli) into a cached writable prefix at OPENCLAW_DOCKER_CLI_TOOLS_DIR (default: ~/.cache/openclaw/docker-cli-tools).pnpm test:docker:live-cli-backend:claude-subscription requires portable Claude Code subscription OAuth through either ~/.claude/.credentials.json with claudeAiOauth.subscriptionType or CLAUDE_CODE_OAUTH_TOKEN from claude setup-token. It first proves direct claude -p in Docker, then runs two Gateway CLI-backend turns without preserving Anthropic API-key env vars. This subscription lane disables the Claude MCP/tool and image probes by default because Claude currently routes third-party app usage through extra-usage billing instead of normal subscription plan limits.cron tool call verified through the gateway CLI.src/infra/push-apns-http2.live.test.ts403 InvalidProviderToken response comes back through the proxy path.OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_APNS_REACHABILITY=1 pnpm test:live src/infra/push-apns-http2.live.test.tsOPENCLAW_LIVE_APNS_TIMEOUT_MS=30000/acp spawn ... --bind here)src/gateway/gateway-acp-bind.live.test.ts/acp spawn <agent> --bind herepnpm test:live src/gateway/gateway-acp-bind.live.test.tsOPENCLAW_LIVE_ACP_BIND=1claude,codex,geminipnpm test:live ...: claudeacpxOPENCLAW_LIVE_ACP_BIND_AGENT=claudeOPENCLAW_LIVE_ACP_BIND_AGENT=codexOPENCLAW_LIVE_ACP_BIND_AGENT=droidOPENCLAW_LIVE_ACP_BIND_AGENT=geminiOPENCLAW_LIVE_ACP_BIND_AGENT=opencodeOPENCLAW_LIVE_ACP_BIND_AGENTS=claude,codex,geminiOPENCLAW_LIVE_ACP_BIND_AGENT_COMMAND='npx -y @agentclientprotocol/claude-agent-acp@<version>'OPENCLAW_LIVE_ACP_BIND_CODEX_MODEL=gpt-5.5OPENCLAW_LIVE_ACP_BIND_OPENCODE_MODEL=opencode/kimi-k2.6OPENCLAW_LIVE_ACP_BIND_REQUIRE_TRANSCRIPT=1OPENCLAW_LIVE_ACP_BIND_REQUIRE_CRON=1OPENCLAW_LIVE_ACP_BIND_PARENT_MODEL=openai/gpt-5.5chat.send surface with admin-only synthetic originating-route fields so tests can attach message-channel context without pretending to deliver externally.OPENCLAW_LIVE_ACP_BIND_AGENT_COMMAND is unset, the test uses the embedded acpx plugin's built-in agent registry for the selected ACP harness agent.OPENCLAW_LIVE_ACP_BIND_REQUIRE_CRON=1 to make that post-bind cron probe strict.Example:
OPENCLAW_LIVE_ACP_BIND=1 \
OPENCLAW_LIVE_ACP_BIND_AGENT=claude \
pnpm test:live src/gateway/gateway-acp-bind.live.test.ts
Docker recipe:
pnpm test:docker:live-acp-bind
Single-agent Docker recipes:
pnpm test:docker:live-acp-bind:claude
pnpm test:docker:live-acp-bind:codex
pnpm test:docker:live-acp-bind:droid
pnpm test:docker:live-acp-bind:gemini
pnpm test:docker:live-acp-bind:opencode
Docker notes:
scripts/test-live-acp-bind-docker.sh.claude, codex, then gemini.OPENCLAW_LIVE_ACP_BIND_AGENTS=claude, OPENCLAW_LIVE_ACP_BIND_AGENTS=codex, OPENCLAW_LIVE_ACP_BIND_AGENTS=droid, OPENCLAW_LIVE_ACP_BIND_AGENTS=gemini, or OPENCLAW_LIVE_ACP_BIND_AGENTS=opencode to narrow the matrix.~/.profile, stages the matching CLI auth material into the container, then installs the requested live CLI (@anthropic-ai/claude-code, @openai/codex, Factory Droid via https://app.factory.ai/cli, @google/gemini-cli, or opencode-ai) if missing. The ACP backend itself is the embedded acpx/runtime package from the official acpx plugin.~/.factory for settings, forwards FACTORY_API_KEY, and requires that API key because local Factory OAuth/keyring auth is not portable into the container. It uses ACPX's built-in droid exec --output-format acp registry entry.OPENCODE_CONFIG_CONTENT default model from OPENCLAW_LIVE_ACP_BIND_OPENCODE_MODEL (default opencode/kimi-k2.6) after sourcing ~/.profile, and pnpm test:docker:live-acp-bind:opencode requires a bound assistant transcript instead of accepting the generic post-bind skip.acpx CLI calls are only a manual/workaround path for comparing behavior outside the Gateway. The Docker ACP bind smoke exercises OpenClaw's embedded acpx runtime backend.agent method:
codex pluginOPENCLAW_AGENT_RUNTIME=codexopenai/gpt-5.5 with the Codex harness forced/codex status and /codex models through the same gateway command
pathsrc/gateway/gateway-codex-harness.live.test.tsOPENCLAW_LIVE_CODEX_HARNESS=1openai/gpt-5.5OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=1OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=1OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=1agentRuntime.id: "codex" so a broken Codex harness cannot
pass by silently falling back to PI.OPENAI_API_KEY for non-Codex probes when applicable,
plus optional copied ~/.codex/auth.json and ~/.codex/config.toml.Local recipe:
source ~/.profile
OPENCLAW_LIVE_CODEX_HARNESS=1 \
OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=1 \
OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=1 \
OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=1 \
OPENCLAW_LIVE_CODEX_HARNESS_MODEL=openai/gpt-5.5 \
pnpm test:live -- src/gateway/gateway-codex-harness.live.test.ts
Docker recipe:
source ~/.profile
pnpm test:docker:live-codex-harness
Docker notes:
scripts/test-live-codex-harness-docker.sh.~/.profile, passes OPENAI_API_KEY, copies Codex CLI
auth files when present, installs @openai/codex into a writable mounted npm
prefix, stages the source tree, then runs only the Codex-harness live test.OPENCLAW_LIVE_CODEX_HARNESS_IMAGE_PROBE=0 or
OPENCLAW_LIVE_CODEX_HARNESS_MCP_PROBE=0 or
OPENCLAW_LIVE_CODEX_HARNESS_GUARDIAN_PROBE=0 when you need a narrower debug
run.Narrow, explicit allowlists are fastest and least flaky:
Single model, direct (no gateway):
OPENCLAW_LIVE_MODELS="openai/gpt-5.5" pnpm test:live src/agents/models.profiles.live.test.tsSingle model, gateway smoke:
OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.5" pnpm test:live src/gateway/gateway-models.profiles.live.test.tsTool calling across several providers:
OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.5,openai-codex/gpt-5.5,anthropic/claude-opus-4-6,google/gemini-3-flash-preview,deepseek/deepseek-v4-flash,zai/glm-5.1,minimax/MiniMax-M2.7" pnpm test:live src/gateway/gateway-models.profiles.live.test.tsGoogle focus (Gemini API key + Antigravity):
OPENCLAW_LIVE_GATEWAY_MODELS="google/gemini-3-flash-preview" pnpm test:live src/gateway/gateway-models.profiles.live.test.tsOPENCLAW_LIVE_GATEWAY_MODELS="google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-pro-high" pnpm test:live src/gateway/gateway-models.profiles.live.test.tsGoogle adaptive thinking smoke:
source ~/.profilepnpm openclaw qa manual --provider-mode live-frontier --model google/gemini-3.1-pro-preview --alt-model google/gemini-3.1-pro-preview --message '/think adaptive Reply exactly: GEMINI_ADAPTIVE_OK' --timeout-ms 180000pnpm openclaw qa manual --provider-mode live-frontier --model google/gemini-2.5-flash --alt-model google/gemini-2.5-flash --message '/think adaptive Reply exactly: GEMINI25_ADAPTIVE_OK' --timeout-ms 180000Notes:
google/... uses the Gemini API (API key).google-antigravity/... uses the Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint).google-gemini-cli/... uses the local Gemini CLI on your machine (separate auth + tooling quirks).gemini binary; it has its own auth and can behave differently (streaming/tool support/version skew).There is no fixed “CI model list” (live is opt-in), but these are the recommended models to cover regularly on a dev machine with keys.
This is the “common models” run we expect to keep working:
openai/gpt-5.5openai-codex/gpt-5.5anthropic/claude-opus-4-6 (or anthropic/claude-sonnet-4-6)google/gemini-3.1-pro-preview and google/gemini-3-flash-preview (avoid older Gemini 2.x models)google-antigravity/claude-opus-4-6-thinking and google-antigravity/gemini-3-flashdeepseek/deepseek-v4-flash and deepseek/deepseek-v4-prozai/glm-5.1minimax/MiniMax-M2.7Run gateway smoke with tools + image:
OPENCLAW_LIVE_GATEWAY_MODELS="openai/gpt-5.5,openai-codex/gpt-5.5,anthropic/claude-opus-4-6,google/gemini-3.1-pro-preview,google/gemini-3-flash-preview,google-antigravity/claude-opus-4-6-thinking,google-antigravity/gemini-3-flash,deepseek/deepseek-v4-flash,zai/glm-5.1,minimax/MiniMax-M2.7" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts
Pick at least one per provider family:
openai/gpt-5.5anthropic/claude-opus-4-6 (or anthropic/claude-sonnet-4-6)google/gemini-3-flash-preview (or google/gemini-3.1-pro-preview)deepseek/deepseek-v4-flashzai/glm-5.1minimax/MiniMax-M2.7Optional additional coverage (nice to have):
xai/grok-4.3 (or latest available)mistral/… (pick one “tools” capable model you have enabled)cerebras/… (if you have access)lmstudio/… (local; tool calling depends on API mode)Include at least one image-capable model in OPENCLAW_LIVE_GATEWAY_MODELS (Claude/Gemini/OpenAI vision-capable variants, etc.) to exercise the image probe.
If you have keys enabled, we also support testing via:
openrouter/... (hundreds of models; use openclaw models scan to find tool+image capable candidates)opencode/... for Zen and opencode-go/... for Go (auth via OPENCODE_API_KEY / OPENCODE_ZEN_API_KEY)More providers you can include in the live matrix (if you have creds/config):
openai, openai-codex, anthropic, google, google-vertex, google-antigravity, google-gemini-cli, zai, openrouter, opencode, opencode-go, xai, groq, cerebras, mistral, github-copilotmodels.providers (custom endpoints): minimax (cloud/API), plus any OpenAI/Anthropic-compatible proxy (LM Studio, vLLM, LiteLLM, etc.)Live tests discover credentials the same way the CLI does. Practical implications:
If the CLI works, live tests should find the same keys.
If a live test says “no creds”, debug the same way you’d debug openclaw models list / model selection.
Per-agent auth profiles: ~/.openclaw/agents/<agentId>/agent/auth-profiles.json (this is what “profile keys” means in the live tests)
Config: ~/.openclaw/openclaw.json (or OPENCLAW_CONFIG_PATH)
Legacy state dir: ~/.openclaw/credentials/ (copied into the staged live home when present, but not the main profile-key store)
Live local runs copy the active config, per-agent auth-profiles.json files, legacy credentials/, and supported external CLI auth dirs into a temp test home by default; staged live homes skip workspace/ and sandboxes/, and agents.*.workspace / agentDir path overrides are stripped so probes stay off your real host workspace.
If you want to rely on env keys (e.g. exported in your ~/.profile), run local tests after source ~/.profile, or use the Docker runners below (they can mount ~/.profile into the container).
extensions/deepgram/audio.live.test.tsDEEPGRAM_API_KEY=... DEEPGRAM_LIVE_TEST=1 pnpm test:live extensions/deepgram/audio.live.test.tsextensions/byteplus/live.test.tsBYTEPLUS_API_KEY=... BYTEPLUS_LIVE_TEST=1 pnpm test:live extensions/byteplus/live.test.tsBYTEPLUS_CODING_MODEL=ark-code-latestextensions/comfy/comfy.live.test.tsOPENCLAW_LIVE_TEST=1 COMFY_LIVE_TEST=1 pnpm test:live -- extensions/comfy/comfy.live.test.tsmusic_generate pathsplugins.entries.comfy.config.<capability> is configuredtest/image-generation.runtime.live.test.tspnpm test:live test/image-generation.runtime.live.test.tspnpm test:live:media image~/.profile) before probingauth-profiles.json do not mask real shell credentials<provider>:generate<provider>:edit when the provider declares edit supportdeepinfrafalgoogleminimaxopenaiopenroutervydraxaiOPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="openai,google,openrouter,xai"OPENCLAW_LIVE_IMAGE_GENERATION_PROVIDERS="deepinfra"OPENCLAW_LIVE_IMAGE_GENERATION_MODELS="openai/gpt-image-2,google/gemini-3.1-flash-image-preview,openrouter/google/gemini-3.1-flash-image-preview,xai/grok-imagine-image"OPENCLAW_LIVE_IMAGE_GENERATION_CASES="google:flash-generate,google:pro-edit,openrouter:generate,xai:default-generate,xai:default-edit"OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 to force profile-store auth and ignore env-only overridesFor the shipped CLI path, add an infer smoke after the provider/runtime live
test passes:
OPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_INFER_CLI_TEST=1 pnpm test:live -- test/image-generation.infer-cli.live.test.ts
openclaw infer image providers --json
openclaw infer image generate \
--model google/gemini-3.1-flash-image-preview \
--prompt "Minimal flat test image: one blue square on a white background, no text." \
--output ./openclaw-infer-image-smoke.png \
--json
This covers CLI argument parsing, config/default-agent resolution, bundled plugin activation, the shared image-generation runtime, and the live provider request. Plugin dependencies are expected to be present before runtime load.
extensions/music-generation-providers.live.test.tsOPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/music-generation-providers.live.test.tspnpm test:live:media music~/.profile) before probingauth-profiles.json do not mask real shell credentialsgenerate with prompt-only inputedit when the provider declares capabilities.edit.enabledgoogle: generate, editminimax: generatecomfy: separate Comfy live file, not this shared sweepOPENCLAW_LIVE_MUSIC_GENERATION_PROVIDERS="google,minimax"OPENCLAW_LIVE_MUSIC_GENERATION_MODELS="google/lyria-3-clip-preview,minimax/music-2.6"OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 to force profile-store auth and ignore env-only overridesextensions/video-generation-providers.live.test.tsOPENCLAW_LIVE_TEST=1 pnpm test:live -- extensions/video-generation-providers.live.test.tspnpm test:live:media videoOPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS (180000 by default)--video-providers fal or OPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="fal" to run it explicitly~/.profile) before probingauth-profiles.json do not mask real shell credentialsgenerate by defaultOPENCLAW_LIVE_VIDEO_GENERATION_FULL_MODES=1 to also run declared transform modes when available:
imageToVideo when the provider declares capabilities.imageToVideo.enabled and the selected provider/model accepts buffer-backed local image input in the shared sweepvideoToVideo when the provider declares capabilities.videoToVideo.enabled and the selected provider/model accepts buffer-backed local video input in the shared sweepimageToVideo providers in the shared sweep:
vydra because bundled veo3 is text-only and bundled kling requires a remote image URLOPENCLAW_LIVE_TEST=1 OPENCLAW_LIVE_VYDRA_VIDEO=1 pnpm test:live -- extensions/vydra/vydra.live.test.tsveo3 text-to-video plus a kling lane that uses a remote image URL fixture by defaultvideoToVideo live coverage:
runway only when the selected model is runway/gen4_alephvideoToVideo providers in the shared sweep:
alibaba, qwen, xai because those paths currently require remote http(s) / MP4 reference URLsgoogle because the current shared Gemini/Veo lane uses local buffer-backed input and that path is not accepted in the shared sweepopenai because the current shared lane lacks org-specific video inpaint/remix access guaranteesOPENCLAW_LIVE_VIDEO_GENERATION_PROVIDERS="deepinfra,google,openai,runway"OPENCLAW_LIVE_VIDEO_GENERATION_MODELS="google/veo-3.1-fast-generate-preview,openai/sora-2,runway/gen4_aleph"OPENCLAW_LIVE_VIDEO_GENERATION_SKIP_PROVIDERS="" to include every provider in the default sweep, including FALOPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS=60000 to reduce each provider operation cap for an aggressive smoke runOPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1 to force profile-store auth and ignore env-only overridespnpm test:live:media~/.profilescripts/test-live.mjs, so heartbeat and quiet-mode behavior stay consistentpnpm test:live:mediapnpm test:live:media image video --providers openai,google,minimaxpnpm test:live:media video --video-providers openai,runway --all-providerspnpm test:live:media music --quiet