plugins/ruflo-workflows/commands/gaia-submit.md
Build a submission-ready package from a completed benchmark run and sign it with the ruflo Ed25519 witness manifest.
/gaia submit
/gaia submit --results=~/.cache/ruflo/gaia/results-latest.json
/gaia submit --results=./my-results.json --dry-run
| Flag | Default | Description |
|---|---|---|
--results | ~/.cache/ruflo/gaia/results-latest.json | Path to the JSON results file from /gaia run |
--run-id | auto (from git SHA) | Short identifier embedded in the package directory name |
--dry-run | off | Build and validate the package but do not write it to disk |
--no-sign | off | Skip Ed25519 signing (not recommended for leaderboard submissions) |
submission-<date>-<short-sha>/
├── results.jsonl — one JSON object per question (HAL-compatible)
├── trajectories.jsonl — full agent trajectory per question
├── metadata.json — model, harness version, tool catalogue, cost
├── manifest.md.json — Ed25519-signed witness manifest
└── README.md — human-readable summary + comparison vs HAL baseline
{
"task_id": "e1fc63a2-da7a-432f-be78-7c4a95598703",
"model_answer": "4",
"reasoning_trace": "[full agent trace]",
"tools_used": ["web_search", "python_exec"],
"turns": 5,
"wall_seconds": 12.4
}
~/.cache/ruflo/gaia/results-latest.json;
ask if multiple candidates exist.level, model, summary, results array.results[] → HAL-compatible results.jsonl (one JSON per line).trajectories.jsonl from any trajectory fields in the results.metadata.json:
{
"submitted_at": "<ISO-8601>",
"harness": "[email protected] / @claude-flow/[email protected]",
"model": "<model-id>",
"gaia_level": 1,
"tool_catalogue": ["web_search","file_read","web_browse","image_describe","python_exec"],
"total_questions": 53,
"pass_rate": 0.208,
"est_cost_usd": 1.23,
"adrs": ["ADR-133","ADR-135","ADR-136"],
"git_sha": "<short-sha>"
}
node plugins/ruflo-core/scripts/witness/sign.mjs submission-<id>/README.md with pass-rate table comparing to HAL baselines.--dry-run, print the package tree and manifest hash without writing.After generating the package:
zip -r submission-$(date +%Y%m%d).zip submission-<date>-<sha>/
# Upload at https://huggingface.co/spaces/gaia-benchmark/leaderboard