optional-skills/research/darwinian-evolver/SKILL.md
Run Imbue's darwinian_evolver — an LLM-driven evolutionary search loop — to optimize a prompt, regex, SQL query, or small code snippet against a fitness function.
Status: thin wrapper around the upstream tool. The skill installs it, walks the
agent through writing a Problem definition (organism + evaluator + mutator),
and drives the loop via the upstream CLI or a small custom Python driver.
License: the upstream tool is AGPL-3.0. The skill ONLY ever invokes it
via the upstream CLI or a subprocess/uv run call (mere aggregation). Do NOT
import upstream classes into Hermes itself.
Do not use this when:
git, uv (or pip)OPENROUTER_API_KEY, ANTHROPIC_API_KEY, or OPENAI_API_KEYThe skill ships a small parrot_openrouter.py driver that uses OPENROUTER_API_KEY
via the OpenAI SDK, so any model on OpenRouter works. The upstream CLI itself
hardcodes Anthropic and needs ANTHROPIC_API_KEY.
Run via the terminal tool:
mkdir -p ~/.hermes/cache/darwinian-evolver && cd ~/.hermes/cache/darwinian-evolver
[ -d darwinian_evolver ] || git clone --depth 1 https://github.com/imbue-ai/darwinian_evolver.git
cd darwinian_evolver && uv sync
Verify:
cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver \
&& uv run darwinian_evolver --help | head -5
Tiny smoke test (requires ANTHROPIC_API_KEY):
cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver
uv run darwinian_evolver parrot \
--num_iterations 2 \
--num_parents_per_iteration 2 \
--mutator_concurrency 2 --evaluator_concurrency 2 \
--output_dir /tmp/parrot_demo
Outputs:
/tmp/parrot_demo/snapshots/iteration_N.pkl — pickled population per iteration/tmp/parrot_demo/<jsonl> — per-iteration JSON log (path printed at end)Open ~/.hermes/cache/darwinian-evolver/darwinian_evolver/darwinian_evolver/lineage_visualizer.html
in a browser and load the JSON log to see the evolutionary tree.
The skill ships scripts/parrot_openrouter.py — same parrot problem, but the
LLM call goes through OpenRouter so any provider works.
# From wherever the skill is installed:
SKILL_DIR=~/.hermes/skills/research/darwinian-evolver
DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver
cd "$DE_DIR" && \
EVOLVER_MODEL='openai/gpt-4o-mini' \
uv run --with openai python "$SKILL_DIR/scripts/parrot_openrouter.py" \
--num_iterations 3 --num_parents_per_iteration 2 \
--output_dir /tmp/parrot_or
Inspect the result with scripts/show_snapshot.py:
uv run --with openai python "$SKILL_DIR/scripts/show_snapshot.py" \
/tmp/parrot_or/snapshots/iteration_3.pkl
Expected output: 7 evolved prompt templates ranked by score, with the best
landing around 0.6–0.8 (the seed Say {{ phrase }} scored 0.000).
The skill ships templates/custom_problem_template.py — copy, edit, run.
Three things you must define:
Organism — a Pydantic BaseModel subclass holding the artifact being
evolved (prompt_template: str, regex_pattern: str, sql_query: str,
code_block: str, etc.). Add a run(*args) method that exercises it.
Evaluator — .evaluate(organism) -> EvaluationResult(score=..., trainable_failure_cases=[...], holdout_failure_cases=[...], is_viable=True).
score is in [0, 1]. Higher is better.trainable_failure_cases — what the mutator sees. Include enough
context (input, expected, actual) for the LLM to diagnose.holdout_failure_cases — kept out of the mutator's view. Use these
to detect overfitting.is_viable=True unless the organism is completely broken (raises,
returns None, etc.). A 0-score viable organism is fine — it just gets
down-weighted in parent selection.Mutator — .mutate(organism, failure_cases, learning_log_entries) -> list[Organism].
Typically: build an LLM prompt that includes the current organism + a
failure case + an ask to propose a fix; parse the LLM's response; return
a new Organism. Return [] on parse failure — the loop handles it.
Then write a driver script that wires Problem(initial_organism, evaluator, [mutators])
into EvolveProblemLoop and iterates over loop.run(num_iterations=N) — the
shipped scripts/parrot_openrouter.py is the reference.
| flag | default | when to change |
|---|---|---|
--num_iterations | 5 | bump to 10–20 once you trust the evaluator |
--num_parents_per_iteration | 4 | drop to 2 for cheap exploration |
--mutator_concurrency | 10 | drop to 2–4 to avoid rate limits |
--evaluator_concurrency | 10 | same; evaluator hits the LLM too |
--batch_size | 1 | raise to 3–5 once your mutator handles multiple failures |
--verify_mutations | off | turn on once mutator is wasteful (>10× cost saving on later runs per Imbue) |
--midpoint_score | p75 | leave alone unless scores cluster |
--sharpness | 10 | leave alone |
Initial organism must be viable — set is_viable=True in your
EvaluationResult even on a 0-score seed. The loop refuses non-viable
organisms because they imply the loop has nothing to evolve from.try/except and return f"<LLM_ERROR: {e}>" — the
evolver will just score that organism 0 and move on.loop.run() is a generator — calling it doesn't run anything until
you iterate. Use for snap in loop.run(num_iterations=N):.iteration_N.pkl contains a dict with
population_snapshot (more pickled bytes). To unpickle you must have the
Organism class importable under the same dotted path it was pickled at.uv run darwinian_evolver <problem>
reaches for ANTHROPIC_API_KEY and uses Claude Sonnet. To use any other
provider, write a driver like parrot_openrouter.py.from darwinian_evolver import ... inside Hermes core.
Custom driver scripts under ~/.hermes/skills/... are user-side and fine.pip install darwinian-evolver will pull the wrong
thing. Always install from the GitHub repo.After install + a parrot run, exit code 0 from this is sufficient:
DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver
ls "$DE_DIR/darwinian_evolver/lineage_visualizer.html" >/dev/null && \
cd "$DE_DIR" && uv run darwinian_evolver --help >/dev/null && \
echo "darwinian-evolver: OK"