Back to Ruflo

/gaia — GAIA Benchmark Dispatcher

plugins/ruflo-workflows/commands/gaia.md

3.10.51.5 KB
Original Source

/gaia — GAIA Benchmark Dispatcher

Dispatch GAIA benchmark operations. All subcommands are thin wrappers over the gaia-bench CLI command shipped in @claude-flow/cli.

Subcommands

CommandPurpose
/gaia runExecute a benchmark run against one or more models
/gaia submitPackage and sign results for HAL leaderboard submission
/gaia leaderboardFetch and display current HAL scores + our positioning
/gaia validatePre-submit checks: TypeScript clean, dataset accessible, env keys present
/gaia historyShow measured runs stored in the gaia-runs namespace
/gaia costReport cumulative API spend and project cost for next configurations

Quick start

/gaia validate
/gaia run --level=1 --limit=10 --models=haiku
/gaia submit --results=~/.cache/ruflo/gaia/results-latest.json

Environment variables resolved

VariablePurpose
ANTHROPIC_API_KEYAnthropic model inference
HF_TOKENHugging Face dataset download
GOOGLE_AI_API_KEYGemini model support
GOOGLE_CUSTOM_SEARCH_API_KEYGoogle Custom Search tool
GOOGLE_CUSTOM_SEARCH_CXCustom Search Engine ID

If any required variable is missing the command will instruct you how to set it (env export or GCP secret).

Extensibility

This dispatcher is intentionally benchmark-agnostic. Future benchmarks (SWE-bench, WebArena, HumanEval) can be added as additional subcommands without modifying this file.