.agents/skills/openclaw-qa-testing/SKILL.md
Use this skill for qa-lab / qa-channel work. Repo-local QA only.
docs/concepts/qa-e2e-automation.mddocs/help/testing.mddocs/channels/qa-channel.mdqa/README.mdqa/scenarios/index.mdextensions/qa-lab/src/suite.tsextensions/qa-lab/src/character-eval.tsopenai/gpt-5.4openai/gpt-5.4-proopenai/gpt-5.4-minimock-openailive-frontierOPENCLAW_LIVE_OPENAI_KEY="${OPENAI_API_KEY}" \
pnpm openclaw qa suite \
--provider-mode live-frontier \
--model openai/gpt-5.4 \
--alt-model openai/gpt-5.4 \
--output-dir .artifacts/qa-e2e/run-all-live-frontier-<tag>
.artifacts/qa-e2e/run-all-live-frontier-<tag>/qa-suite-summary.json.artifacts/qa-e2e/run-all-live-frontier-<tag>/qa-suite-report.mdopenclaw-qa listen port and report http://127.0.0.1:<port>.For local QA-lab OpenTelemetry validation, use:
pnpm qa:otel:smoke
This starts a local OTLP/HTTP trace receiver, runs the otel-trace-smoke
scenario through qa-channel, decodes the emitted protobuf spans, and verifies
the exported trace names and privacy contract. It does not require Opik,
Langfuse, or external collector credentials.
pnpm openclaw qa matrix defaults to the full all profile. Use explicit
profiles for faster CI/release proof:
OPENCLAW_QA_MATRIX_NO_REPLY_WINDOW_MS=3000 \
pnpm openclaw qa matrix --profile fast --fail-fast
fast: release-critical transport contract, excluding generated image and
deep E2EE recovery inventory.transport, media, e2ee-smoke, e2ee-deep, e2ee-cli: sharded full
Matrix coverage.QA-Lab - All Lanes uses explicit fast Matrix on scheduled runs. Manual
dispatch keeps matrix_profile=all as the default and always shards that full
Matrix selection.op only inside tmux for QA secret lookup in this repo.op account list
OpenClawTelegram E2EOPENCLAW_QA_TELEGRAM_DRIVER_BOT_TOKENOPENCLAW_QA_TELEGRAM_SUT_BOT_TOKENOPENCLAW_QA_PROVIDER_MODEOPENCLAW_NPM_TELEGRAM_PACKAGE_SPECOpenClawOPENCLAW_QA_CONVEX_SITE_URLOPENCLAW_QA_CONVEX_SECRET_MAINTAINEROPENCLAW_QA_CONVEX_SECRET_CIPrivateOPENCLAW QA, Convex, TelegramOPENCLAW_QA_TELEGRAM_GROUP_ID may be stored separately from Telegram E2EOPENCLAW_QA_TELEGRAM_GROUP_IDOpenClaw/OPENCLAW_QA_CONVEX_SITE_URL; if that is stale or unclear, ask for the active pool URL before runningOPENCLAW_QA_TELEGRAM_GROUP_ID="..." \
OPENCLAW_QA_TELEGRAM_DRIVER_BOT_TOKEN="..." \
OPENCLAW_QA_TELEGRAM_SUT_BOT_TOKEN="..." \
OPENCLAW_QA_PROVIDER_MODE="mock-openai" \
OPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC="openclaw@beta" \
pnpm test:docker:npm-telegram-live
scripts/e2e/npm-telegram-live-runner.ts reads OPENCLAW_NPM_TELEGRAM_PROVIDER_MODEOPENCLAW_QA_PROVIDER_MODE is consumed by that wrapperOPENCLAW_QA_PROVIDER_MODE, map it explicitly to OPENCLAW_NPM_TELEGRAM_PROVIDER_MODE before running the Docker lanepnpm test:docker:npm-telegram-live passed with:
OPENCLAW_QA_CREDENTIAL_SOURCE=convexOPENCLAW_QA_CREDENTIAL_ROLE=maintainerOPENCLAW_QA_CONVEX_SITE_URLOPENCLAW_QA_CONVEX_SECRET_MAINTAINEROPENCLAW_NPM_TELEGRAM_PROVIDER_MODE=mock-openaiop signin blocks, prefer dispatching the manual GitHub lane because the qa-live-shared environment already has Convex CI credentials:gh workflow run "NPM Telegram Beta E2E" --repo openclaw/openclaw --ref main \
-f [email protected] \
-f [email protected] \
-f provider_mode=mock-openai
gh run view --json artifacts is not supported; list artifacts with:gh api repos/openclaw/openclaw/actions/runs/<run-id>/artifacts
Use qa character-eval for style/persona/vibe checks across multiple live models.
pnpm openclaw qa character-eval \
--model openai/gpt-5.4,thinking=xhigh \
--model openai/gpt-5.2,thinking=xhigh \
--model openai/gpt-5,thinking=xhigh \
--model anthropic/claude-opus-4-6,thinking=high \
--model anthropic/claude-sonnet-4-6,thinking=high \
--model zai/glm-5.1,thinking=high \
--model moonshot/kimi-k2.5,thinking=high \
--model google/gemini-3.1-pro-preview,thinking=high \
--judge-model openai/gpt-5.4,thinking=xhigh,fast \
--judge-model anthropic/claude-opus-4-6,thinking=high \
--concurrency 16 \
--judge-concurrency 16 \
--output-dir .artifacts/qa-e2e/character-eval-<tag>
provider/model,thinking=<level>[,fast|,no-fast|,fast=<bool>] for both --model and --judge-model.--model-thinking; keep that flag as legacy compatibility only.openai/gpt-5.4, openai/gpt-5.2, openai/gpt-5, anthropic/claude-opus-4-6, anthropic/claude-sonnet-4-6, zai/glm-5.1, moonshot/kimi-k2.5, and google/gemini-3.1-pro-preview when no --model is passed.high, with xhigh for OpenAI models that support it. Prefer inline --model provider/model,thinking=<level>; --thinking <level> and --model-thinking <provider/model=level> remain compatibility shims.,fast, ,no-fast, or ,fast=false for one model; use --fast only to force fast mode for every candidate.openai/gpt-5.4,thinking=xhigh,fast and anthropic/claude-opus-4-6,thinking=high.--concurrency <n> and --judge-concurrency <n> to override when local gateways or provider limits need a gentler lane.qa/scenarios/.SOUL.md and blank IDENTITY.md in the scenario flow. Use SOUL.md + IDENTITY.md only when intentionally testing how the normal OpenClaw identity combines with the character.SOUL.md, then normal user turns such as chat, workspace help, and small file tasks; do not ask "how would you react?" or tell the model it is in an eval.Use model refs shaped like codex-cli/<codex-model> whenever QA should exercise Codex as a model backend.
Examples:
pnpm openclaw qa suite \
--provider-mode live-frontier \
--model codex-cli/<codex-model> \
--alt-model codex-cli/<codex-model> \
--scenario <scenario-id> \
--output-dir .artifacts/qa-e2e/codex-<tag>
pnpm openclaw qa manual \
--model codex-cli/<codex-model> \
--message "Reply exactly: CODEX_OK"
CODEX_HOME so Codex CLI auth/config works while keeping HOME and OPENCLAW_HOME sandboxed.CODEX_HOME.CODEX_HOME, ~/.profile, and gateway child logs before changing scenario assertions.codex-cli/<codex-model> as another candidate in qa character-eval; the report should label it as an opaque model name.qa/.extensions/qa-lab/src/suite.tsextensions/qa-lab/src/lab-server.tsextensions/qa-lab/src/gateway-child.tsextensions/qa-channel/extensions/qa-lab/src/suite.tsrepo/... inside seeded workspaceqa/scenarios/qa/scenarios/index.md alignedextensions/qa-lab/src/suite.ts.artifacts/qa-e2e/