Agent Reproduce Align

Purpose

Use this skill when Qwen Code already has a candidate implementation and needs evidence-based parity with a selected reference agent: codex or claude-code. The goal is not byte-for-byte equality; it is matching the observable contract that matters for the feature.

Default target repo: the current working directory. Use a user-specified path only when the user explicitly provides one.

Reference Agent Selection

Use the same reference agent selected during $agent-reproduce-feature. If the earlier choice is unavailable, ask once and record the answer in the scenario or run notes.

Workflow

Re-state the parity target:
- feature name and trigger
- selected reference agent
- one baseline prompt or interaction script
- acceptable differences
- must-match fields
Run the reference agent and Qwen Code in separate capture directories with the same scenario.
Capture the selected reference agent's local state before and after the reference run when state may affect parity.
Normalize traces with scripts/normalize_trace.py.
Compare normalized traces with scripts/compare_traces.py.
Inspect differences in this order:
- reference-agent state changes that explain behavior
- missing tool/function names
- schema shape and required fields
- model settings and response mode
- prompt role/order differences that affect behavior
- terminal-visible output and exit status
Patch Qwen Code, rerun the smallest failing scenario, and repeat.
Preserve only redacted minimal fixtures in the repo.

Read references/alignment-workflow.md before the first comparison pass.

Common Commands

Normalize:

.qwen/skills/agent-reproduce-align/scripts/normalize_trace.py \
  .repro-runs/reference/http.jsonl \
  > .repro-runs/reference/normalized.json

Compare:

.qwen/skills/agent-reproduce-align/scripts/compare_traces.py \
  .repro-runs/reference/normalized.json \
  .repro-runs/qwen/normalized.json

Run a paired shell scenario:

REPRO_REFERENCE_AGENT=codex \
.qwen/skills/agent-reproduce-align/scripts/run_pair_capture.sh \
  .repro-runs/slash-help \
  "codex exec '/help'" \
  "npm test -- --runInBand"

For Claude Code, set REPRO_REFERENCE_AGENT=claude-code and replace the first command with the discovered Claude Code command. When REPRO_REFERENCE_AGENT is set, the paired runner writes reference/state-before, reference/state-after, and reference/state-diff. Use the paired runner only when shell quoting is simple. For interactive slash commands, run the two captures manually with tmux so each side can receive the same keystrokes. Use REPRO_REFERENCE_STATE_ROOT=/tmp/some-root only for tests or custom state directories.

Comparison Rules

Compare contracts before wording. Exact prompt text is usually implementation detail.
Treat absent schemas, wrong required fields, or wrong argument names as high-signal failures.
Treat output ordering as significant only when the user-visible workflow depends on it.
Do not chase provider-specific endpoints, model names, IDs, timestamps, token counts, or ephemeral headers unless the feature depends on them.
Do not chase every local state write. Treat state diffs as explanatory evidence unless the feature contract requires a particular config, memory, or permission side effect.
Stop when Qwen Code passes the user-visible scenario and the remaining trace differences are documented as intentional.

Done Criteria

Reference-agent and Qwen Code traces for the same scenario exist locally.
Reference-agent state diff exists or state capture is documented as irrelevant for the scenario.
The normalized comparison has no unexplained must-match differences.
Qwen Code tests or smoke commands cover the fixed behavior.
Any remaining mismatch is written down in the task notes or Qwen Code docs when it affects users.