packages/omo-codex/plugin/skills/debugging/references/methodology/08-qa.md
Tests cover cases you thought of. Real usage covers the ones you didn't.
The single fastest way to ship a broken fix is to stop at "tests pass". Manual QA means interacting with the running system the way the user does, then comparing observed behavior to the original bug report.
Pick the row that matches the product. Do what it says. Do not substitute.
| Product type | QA means… |
|---|---|
| CLI tool | Open tmux, run the actual command end-to-end, capture output. Paste the session transcript into the journal. Include exit code, stdout, stderr, side-effect check (files created/modified). |
| HTTP API | Start the real server, hit endpoints with curl or httpie, inspect response status + body + headers. Hit the specific endpoint that reproduced the bug. If there's auth, use real auth. |
| Browser-served web app | Drive a real browser via Playwright CLI. See tools/playwright-cli.md. Navigate the exact page/flow that reproduced the bug. Capture screenshot + DOM + network evidence. Do not substitute with curl — browsers have state (cookies, localStorage, service workers, client-side JS, viewport-dependent CSS) that curl does not have. |
| Agent / LLM pipeline | Run the same user prompt that originally failed. Capture the full turn — tool calls, messages, usage counters. Confirm non-zero usage (zero usage = still failing silently, see silent-failure check below). |
| Background worker / job queue | Trigger the job through the normal entry point (API call, cron tick, message publish), tail the worker logs, observe completion state in the queue or DB. Don't just call the worker function directly — the trigger path matters. |
| MCP server | Invoke the tool via its actual client (Claude Desktop, Cursor, etc. if available) or mcp-cli, not just the HTTP probe endpoint. The MCP handshake itself is sometimes where bugs live. |
| Native binary | Re-run the exact command that crashed / misbehaved. If the input was a file, use the same file. If the bug was exploitable, confirm the exploit repro via pwntools (see tools/pwntools.md). Capture exit code, signal if any, core dump if generated. |
| Bundled-app binary (Bun SEA, Node SEA, Electron, etc.) | Re-run the exact command. If the operation requires paid quota / blocked network, capture the app's debug log (APP_DEBUG=1 APP_LOG_LEVEL=debug APP_LOG_FILE=/tmp/trace.log) which usually emits the assembled request before sending. See methodology/partial-runtime-evidence.md for combining partial signals into a defensible verification. |
| Long-running daemon | Start fresh, let it run for the amount of time the bug originally took to manifest (not less), capture resource usage (memory, fd, cpu) throughout. Short-running QA misses resource leaks and cumulative state bugs. |
Every QA run goes in the journal under "Findings":
### Manual QA — <product type> (<ISO timestamp>)
- Scenario: <one line describing what you did>
- Command: `<exact invocation>`
- Observed output:
<verbatim output, trimmed to relevant section>
- Expected output: <what correct behavior looks like>
- Fix verified: yes / no / partial — <details>
If any QA step shows partial or regressed behavior, this is not "mostly done" — it's incomplete. Return to Phase 6.
Regardless of product type, audit the fix against these silent-failure patterns. If the original bug was a silent failure, the same pattern may exist in adjacent code that you haven't tested yet.
ok: true but a sub-field contains an error token (e.g. stopReason: "error", status: "failed")usage.totalTokens === 0 on an LLM responsetry { ... } catch { /* swallowed */ } or except: passerror: null actually being error: "..." with falsy check)Check the runtime reference for additional patterns:
except, logging.exception that goes nowherevoid on async, swallowed .catch(() => {}).unwrap_or_default(), let _ = result, error variants discardedif err != nil { return err } that never reaches user output, recovered panics, buffered channels that block silentlyperror, alarm() / signal masksprocess.env.X baked at build time, dead code from tree-shaking failures, worker sub-bundles diverging from main bundleDon't fix it. This is out of scope for the current bug.
Note it in the journal under a "Follow-ups" section with:
Surface these to the user in the final message under "Next steps I didn't take".
"Fix verified" means: the exact original failing scenario, re-run, now produces the correct output. Not a similar scenario. Not a unit test of the fix. The original scenario.
If you can't re-run the original scenario (e.g. it required a specific data state that's gone), construct the closest equivalent and document the difference in the journal. Escalate to the user if the equivalent is materially different.