.qwen/skills/e2e-testing/SKILL.md
How to run the Qwen Code CLI end-to-end, from building the bundle to inspecting raw API traffic. Use when unit tests are not enough and you need to verify behavior through the full pipeline (model API → tool validation → tool execution).
qwen command — this
matches what the user ran when they filed the issue.npm run build && npm run bundle), then
run node dist/cli.js — this tests your local changes.Run the CLI non-interactively with JSON output (<qwen> = qwen or node dist/cli.js per above):
<qwen> "your prompt here" \
--approval-mode yolo \
--output-format json \
2>/dev/null
The JSON output is a stream of objects. Key types:
type: "system" — init: tools, mcp_servers, model, permission_modetype: "assistant" — model output: content[].type is text, tool_use,
or thinkingtype: "user" — tool results: content[].type is tool_result with
is_errortype: "result" — final output with result text and usage statsPipe through jq to filter the verbose stream, e.g. extract tool-result errors:
... 2>/dev/null | jq 'select(.type=="user") | .message.content[] | select(.is_error)'
When debugging model behavior (wrong tool arguments, schema issues), enable API logging to see the exact request/response payloads:
<qwen> "prompt" \
--approval-mode yolo \
--output-format json \
--openai-logging \
--openai-logging-dir /tmp/api-logs
Each API call produces a JSON file (can be 80KB+ due to full message history).
The bulk is in request.messages (conversation history). Trimmed structure:
{
"request": {
"model": "coder-model",
"messages": [
{
"role": "system|user|assistant",
"content": "...",
"tool_calls?": []
}
],
"tools": [
{
"type": "function",
"function": {
"name": "tool_name",
"description": "...",
"parameters": { ... } // schema sent to the model
}
}
]
},
"response": {
"choices": [
{
"message": {
"role": "assistant",
"content": "...", // text response (may be null)
"tool_calls": [
{
"id": "call_...",
"function": {
"name": "tool_name",
"arguments": "..." // raw JSON string from the model
}
}
]
}
}
]
}
}
Use when you need to verify TUI rendering, test keyboard interactions, or see what the user sees. Headless mode is simpler when you only need structured output.
tmux new-session -d -s test -x 200 -y 50 \
"cd /tmp/test-dir && <qwen> --approval-mode yolo"
sleep 3 # wait for TUI to initialize
Split text and Enter with a short delay — sending them together can cause the TUI to swallow the submit:
tmux send-keys -t test "your prompt here"
sleep 0.5
tmux send-keys -t test Enter
Poll for the input prompt to reappear instead of blind sleeping:
for i in $(seq 1 60); do
sleep 2
tmux capture-pane -t test -p | grep -q "Type your message" && break
done
tmux capture-pane -t test -p -S -100 # -S -100 = 100 lines of scrollback
tmux send-keys cannot reliably send all key combinations.
C-?, C-Shift-*, and function keys with modifiers are unsupported or
unreliable. For these, use the InteractiveSession harness in
integration-tests/interactive/ or test manually.capture-pane captures the final rendered frame, not
intermediate states. Flicker, tearing, or brief blank frames cannot be
detected this way.tmux kill-session -t test
For testing MCP tool behavior end-to-end, read references/mcp-testing.md. It
covers the setup gotchas (config location, git repo requirement) and includes a
reusable zero-dependency test server template in scripts/mcp-test-server.js.
Use scripts/token-stats.py to summarize token usage across recent API logs:
python3 .qwen/skills/e2e-testing/scripts/token-stats.py 20 # last 20 requests
Shows input, cached, and output tokens per request with cache hit rates. Useful for verifying prompt caching behavior or investigating unexpected token counts.
--approval-mode default when testing permission rules. yolo bypasses
rule evaluation entirely — it can't test whether a rule matches.