plugins/ruflo-workflows/skills/gaia-debugging/SKILL.md
When a GAIA question fails, systematically diagnose the root cause and propose a targeted fix.
task_id returns the wrong answer or times out| Code | Mode | Symptom | Fix direction |
|---|---|---|---|
| TG | Tool Gap | Agent lacks a required tool (no image OCR, no PDF reader) | Add tool to catalogue |
| RM | Reasoning Miss | Agent has the right data but draws wrong conclusion | Improve system prompt, add CoT instruction |
| EB | Extraction Bug | Answer is in the trace but FINAL_ANSWER: regex fails | Fix answer extraction pattern |
| LI | Loop Issue | Agent loops (re-asks same tool call) and hits turn limit | Increase max-turns or add loop-detection |
| DS | Dataset Shift | Ground truth differs from what web currently shows | Flag for HAL dataset audit |
| AT | API Timeout | Tool call times out; agent never gets the result | Increase per-turn timeout |
# Find the result for the task_id in the latest run
RESULTS=~/.cache/ruflo/gaia/results-latest.json
node -e "
const r = JSON.parse(require('fs').readFileSync('$RESULTS'));
const q = r.results.find(x => x.task_id === '$TASK_ID');
console.log(JSON.stringify(q, null, 2));
"
Look at the trace output:
node v3/@claude-flow/cli/bin/cli.js gaia-bench run \
--level 1 --limit 1 \
--task-id $TASK_ID \
--models claude-sonnet-4-6 \
--max-turns 20 \
--output json
| Failure | Action |
|---|---|
| TG — missing web_browse | Verify gaia-tools/index.ts exports web_browse; check tool registration |
| TG — missing image OCR | Add image_describe tool call; verify GOOGLE_AI_API_KEY |
| RM — reasoning | Add a system prompt instruction: "Before answering, list all facts you have gathered" |
| EB — extraction | Test the FINAL_ANSWER_RE regex against the trace manually |
| LI — loop | Add a tool-call deduplication guard in gaia-agent.ts |
| AT — timeout | Set DEFAULT_PER_TURN_TIMEOUT_MS higher or use --max-turns flag |
# Re-run the single question
node … gaia-bench run --task-id $TASK_ID --models $MODEL --output json
# If now passing, store the pattern
npx @claude-flow/cli@latest memory store \
--namespace gaia-debug-patterns \
--key "fix-$FAILURE_CODE-$(date +%Y%m%d)" \
--value "task_id=$TASK_ID, mode=$FAILURE_CODE, fix=$FIX_DESCRIPTION"
node -e "
const { createDefaultToolCatalogue } = require('./v3/@claude-flow/cli/src/benchmarks/gaia-tools/index.js');
const cat = createDefaultToolCatalogue({});
console.log('Tools registered:', cat.definitions.map(t => t.name));
"
Expected: web_search, file_read, web_browse, image_describe, python_exec
After resolving a debugging session, store the finding:
npx @claude-flow/cli@latest memory store \
--namespace gaia-debug-patterns \
--key "session-$(date +%Y%m%d-%H%M)" \
--value '{"task_id":"$TASK_ID","failure_mode":"$CODE","fix":"$FIX","verified":true}'
Search for similar past failures:
npx @claude-flow/cli@latest memory search \
--namespace gaia-debug-patterns \
--query "extraction bug final answer regex"