skills/arbor/references/arbor-upstream.md
This skill normally runs HTR natively — Claude is the coordinator and subagents are executors. That's the recommended path: no extra install, no separate API keys, and you stay in the loop to read evidence between cycles.
But the paper's authors also ship a full implementation as a CLI. Use it instead when the user explicitly wants to run the published system (e.g. to reproduce paper results), wants Arbor to run fully unattended for many hours via its own live dashboard, or wants its built-in report/web-UI tooling.
Source: https://github.com/RUC-NLPIR/Arbor
Requires Python ≥ 3.10 and Git.
git clone https://github.com/RUC-NLPIR/Arbor.git
cd Arbor
python -m venv .venv && source .venv/bin/activate
pip install -e .
arbor doctor # verify install, PATH, git, API keys
arbor setup # writes ~/.arbor/config.yaml (provider, model, base URL, keys)
Supported backends: Anthropic, OpenAI / OpenAI-compatible Responses API, and LiteLLM (DeepSeek, Gemini, Qwen, vLLM, Ollama, local gateways). Keys can also be set via environment variables.
E_dev / E_test).research_config.yaml (task description, coordinator
settings — max cycles, depth, merge thresholds — executor max turns, UI mode).
See examples/research_config.example.yaml in the repo.arbor
.arbor/sessions/ with REPORT.md, the event log, and
results. Re-render a past session's report with arbor report <session>.| Command | Purpose |
|---|---|
arbor | Start an interactive research session |
arbor setup | Configure provider / model / keys |
arbor doctor | Diagnose install, PATH, git, API keys |
arbor report <session> | Re-render reports for a past session |
arbor version | Print installed version |
The implementation lives under src/ (a src-layout; the CLI installs as
arbor). The package directories are:
core/ — ReAct loop, tools, LLM providers, context managementcoordinator/ — coordinator agent, the tree, orchestrator, coordinator toolsexecutor/ — executor agent and CLIcli/ — intake, live dashboard, setup, doctor, configevents/ — typed event bus and payloadsreport/, webui/ — report generation and read-only run monitorsearch_agent/ — the minimal ReAct search harness (the M_0 for the
BrowseComp / search-agent tasks)plugins/ — domain plugins (e.g. mle_kaggle.yaml)skills/ — on-demand markdown playbooks(top-level src/ also has dashboard.py, run.py, review.py.)
Naming note: the paper and this skill call the persistent state the hypothesis tree; the tool's code and dashboard call the same structure the Idea Tree. They are the same thing. The depth convention also matches the native skill: root/depth 0 = objective + global insights, depth 1 = research directions, depth 2+ = concrete tested methods.
arbor tool.