plugins/ruflo-workflows/commands/gaia-leaderboard.md
Display the current HAL GAIA leaderboard and compare with stored ruflo runs.
/gaia leaderboard
/gaia leaderboard --level=1 --top=20
/gaia leaderboard --level=2
| Flag | Default | Description |
|---|---|---|
--level | 1 | Show scores for this GAIA level (1, 2, or 3) |
--top | 20 | How many leaderboard entries to display |
--show-ours | on | Overlay our stored run results in the table |
https://huggingface.co/spaces/gaia-benchmark/leaderboardgaia-runs memory namespace.Rank System L1% L2% L3% Overall
---- ---------------------- ----- ----- ----- -------
1 HAL (Sonnet 4.5) 74.6 55.2 31.4 60.1
2 GPT-4o (OpenAI) 71.3 51.8 28.9 56.6
...
-- ruflo (this session) 20.8 -- -- 20.8*
* denotes partial run (L1 only, 53/300 questions).
| System | L1 | Source |
|---|---|---|
| HAL Sonnet 4.5 | 74.6% | Princeton HAL reference, 300 Q |
| ruflo iter 23 | 20.8% | 53 Q, post-SOTA web_search |
| ruflo iter 15 | 9.4% | 53 Q, broken web_search |
https://huggingface.co/spaces/gaia-benchmark/leaderboard.npx @claude-flow/cli@latest memory list --namespace gaia-runs