website/docs/user-guide/features/mixture-of-agents.md
Mixture of Agents is a virtual model provider. Each named MoA preset appears as a selectable model under the moa provider.
When you select a MoA preset, the preset's aggregator is the acting model. It is the model that writes the assistant response and emits tool calls. Reference models run first and provide analysis for the aggregator to use.
Use MoA when a hard task benefits from multiple model perspectives but still needs Hermes' normal agent loop: tool calls, follow-up iterations, interrupts, transcript persistence, and the same session context as any other message.
You can select a preset through the normal model picker surfaces:
/model default --provider moa
/model review --provider moa
MoA presets are selectable on every Hermes surface, because MoA is a normal provider in the model system:
/model — /model <preset> --provider moa, or /model --provider moa for the default preset. A bare /model <preset> also works when the name exactly matches a configured preset.hermes model and the Dashboard model picker — a Mixture of Agents provider row appears with your preset names as its models.MoA presets section; selecting one (MoA: <preset>) switches the active model to that preset. The Desktop settings panel also creates and edits presets.Configured presets therefore show up wherever you would pick any other model.
/moa is one-shot convenience sugar. It runs a single prompt through the default MoA preset, then restores whatever model you were on:
/moa design and implement a migration plan for this flaky test cluster
Hermes temporarily switches to the default MoA preset for that one turn, sends the prompt, then restores your previous model afterward. The whole argument is the prompt — /moa no longer interprets it as a preset name.
/moa
Bare /moa (no prompt) just prints usage.
To switch to a MoA preset for the rest of the session, select it from the model picker — MoA presets appear under a Mixture of Agents provider in every model-selection surface (see above). /moa is deliberately not a model switch, so a normal prompt can never accidentally change your model.
For each main model call when provider moa is selected, Hermes:
Because MoA is selected through the normal model system, it composes automatically with /goal, gateway sessions, TUI sessions, and Desktop chat.
You can configure named MoA presets from:
hermes moa configure [name]config.yamlThe config stores explicit provider/model pairs, so you can mix providers and use multiple models from the same provider:
moa:
default_preset: default
presets:
default:
reference_models:
- provider: openai-codex
model: gpt-5.5
- provider: openrouter
model: deepseek/deepseek-v4-pro
aggregator:
provider: openrouter
model: anthropic/claude-opus-4.8
reference_temperature: 0.6
aggregator_temperature: 0.4
max_tokens: 4096
enabled: true
Default preset:
openai-codex:gpt-5.5openrouter:deepseek/deepseek-v4-proopenrouter:anthropic/claude-opus-4.8hermes moa list
hermes moa configure # update the default preset
hermes moa configure review # create or update a named preset
hermes moa delete review
On HermesBench, a two-model MoA preset — claude-opus-4.8 aggregating over a gpt-5.5 reference — outscores either model run on its own:
| Model | HermesBench score |
|---|---|
| Opus aggregator (opus-4.8 + gpt-5.5 reference) — MoA | 0.8202 |
anthropic/claude-opus-4.8 | 0.7607 |
openai/gpt-5.5 | 0.7412 |
The MoA configuration beats its strongest component (opus-4.8) by ~6 points, confirming that aggregating a second perspective lifts quality on hard tasks rather than just averaging the two.
MoA is built so the main conversation's prompt cache is never broken. Selecting a MoA preset is a normal model selection: it does not mutate past context, swap toolsets, or rebuild the system prompt mid-conversation. Your conversation history, system prompt, and tool schema stay byte-stable, so the cached prefix every other model relies on is preserved exactly as it would be for a plain model. Switching to or away from a MoA preset costs the same cache invalidation as any other /model switch — no more.
Both internal call types cache normally:
So MoA does not sacrifice prompt caching on either call type. Its only real cost is the extra reference calls per iteration — you pay for multiple model perspectives, not for broken caches. The long-lived conversation prefix shared with the rest of Hermes is fully intact.
hermes tools; there is no moa toolset to enable.enabled: false on a preset disables the reference fan-out for that preset: the aggregator acts alone, exactly as if you selected it as a plain model. This is the per-preset off switch surfaced in the dashboard and desktop settings.