qa/frontier-harness-plan.md
Use this when tuning the harness on frontier models before the small-model pass.
Run this subset first on every harness tweak:
approval-turn-tool-followthroughmodel-switch-tool-continuitysource-docs-discovery-reportLonger spot-check after that:
compaction-retry-mutating-toolsubagent-handoffGPT baseline:
pnpm openclaw qa suite \
--provider-mode live-frontier \
--model openai/gpt-5.5 \
--alt-model openai/gpt-5.5 \
--fast \
--scenario approval-turn-tool-followthrough \
--scenario model-switch-tool-continuity \
--scenario source-docs-discovery-report
Claude sweep:
pnpm openclaw qa suite \
--provider-mode live-frontier \
--model anthropic/claude-sonnet-4-6 \
--alt-model anthropic/claude-opus-4-6 \
--scenario approval-turn-tool-followthrough \
--scenario model-switch-tool-continuity \
--scenario source-docs-discovery-report
Gemini sweep:
pnpm openclaw qa suite \
--provider-mode live-frontier \
--model <google-pro-model-ref> \
--alt-model <google-pro-model-ref> \
--scenario approval-turn-tool-followthrough \
--scenario model-switch-tool-continuity \
--scenario source-docs-discovery-report
Use the QA Lab runner catalog or openclaw models list --all to pick the current Google Pro ref.
ok do itRun this after the executable subset, not before:
read QA_KICKOFF_TASK.md, tell me what feels half-baked about this qa mission, and keep it to two short sentences
GPT manual lane:
pnpm openclaw qa manual \
--provider-mode live-frontier \
--model openai/gpt-5.5 \
--alt-model openai/gpt-5.5 \
--fast \
--message "read QA_KICKOFF_TASK.md, tell me what feels half-baked about this qa mission, and keep it to two short sentences"
Claude manual lane:
pnpm openclaw qa manual \
--provider-mode live-frontier \
--model anthropic/claude-sonnet-4-6 \
--alt-model anthropic/claude-opus-4-6 \
--message "read QA_KICKOFF_TASK.md, tell me what feels half-baked about this qa mission, and keep it to two short sentences"
Score it on: