plugins/ruflo-workflows/commands/gaia.md
Dispatch GAIA benchmark operations. All subcommands are thin wrappers over the
gaia-bench CLI command shipped in @claude-flow/cli.
| Command | Purpose |
|---|---|
/gaia run | Execute a benchmark run against one or more models |
/gaia submit | Package and sign results for HAL leaderboard submission |
/gaia leaderboard | Fetch and display current HAL scores + our positioning |
/gaia validate | Pre-submit checks: TypeScript clean, dataset accessible, env keys present |
/gaia history | Show measured runs stored in the gaia-runs namespace |
/gaia cost | Report cumulative API spend and project cost for next configurations |
/gaia validate
/gaia run --level=1 --limit=10 --models=haiku
/gaia submit --results=~/.cache/ruflo/gaia/results-latest.json
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY | Anthropic model inference |
HF_TOKEN | Hugging Face dataset download |
GOOGLE_AI_API_KEY | Gemini model support |
GOOGLE_CUSTOM_SEARCH_API_KEY | Google Custom Search tool |
GOOGLE_CUSTOM_SEARCH_CX | Custom Search Engine ID |
If any required variable is missing the command will instruct you how to set it (env export or GCP secret).
This dispatcher is intentionally benchmark-agnostic. Future benchmarks (SWE-bench, WebArena, HumanEval) can be added as additional subcommands without modifying this file.