Back to Hermes Agent

Honcho Memory Provider

plugins/memory/honcho/README.md

2026.6.518.2 KB
Original Source

Honcho Memory Provider

AI-native cross-session user modeling with multi-pass dialectic reasoning, session summaries, bidirectional peer tools, and persistent conclusions.

Honcho docs: https://docs.honcho.dev/v3/guides/integrations/hermes

Requirements

  • pip install honcho-ai
  • Honcho API key from app.honcho.dev, or a self-hosted instance

Setup

bash
hermes memory setup honcho   # configure Honcho directly (works on a fresh install)
hermes memory setup          # generic picker, choose Honcho from the list

Or manually:

bash
hermes config set memory.provider honcho
echo "HONCHO_API_KEY=***" >> ~/.hermes/.env

hermes honcho setup also works, but only after Honcho is the active memory provider — the honcho subcommand is registered for the active provider only. On a fresh install, use hermes memory setup honcho.

Architecture Overview

Two-Layer Context Injection

Context is injected into the user message at API-call time (not the system prompt) to preserve prompt caching. Only a static mode header goes in the system prompt. The injected block is wrapped in <memory-context> fences with a system note clarifying it's background data, not new user input.

Two independent layers, each on its own cadence:

Layer 1 — Base context (refreshed every contextCadence turns):

  1. SESSION SUMMARY — from session.context(summary=True), placed first
  2. User Representation — Honcho's evolving model of the user
  3. User Peer Card — key facts snapshot
  4. AI Self-Representation — Honcho's model of the AI peer
  5. AI Identity Card — AI peer facts

Layer 2 — Dialectic supplement (fired every dialecticCadence turns): Multi-pass .chat() reasoning about the user, appended after base context.

Both layers are joined, then truncated to fit contextTokens budget via _truncate_to_budget (tokens × 4 chars, word-boundary safe).

Cold Start vs Warm Session Prompts

Dialectic pass 0 automatically selects its prompt based on session state:

  • Cold (no base context cached): "Who is this person? What are their preferences, goals, and working style? Focus on facts that would help an AI assistant be immediately useful."
  • Warm (base context exists): "Given what's been discussed in this session so far, what context about this user is most relevant to the current conversation? Prioritize active context over biographical facts."

Not configurable — determined automatically.

Dialectic Depth (Multi-Pass Reasoning)

dialecticDepth (1–3, clamped) controls how many .chat() calls fire per dialectic cycle:

DepthPassesBehavior
1single .chat()Base query only (cold or warm prompt)
2audit + synthesisPass 0 result is self-audited; pass 1 does targeted synthesis. Conditional bail-out if pass 0 returns strong signal (>300 chars or structured with bullets/sections >100 chars)
3audit + synthesis + reconciliationPass 2 reconciles contradictions across prior passes into a final synthesis

Proportional Reasoning Levels

When dialecticDepthLevels is not set, each pass uses a proportional level relative to dialecticReasoningLevel (the "base"):

DepthPass levels
1[base]
2[minimal, base]
3[minimal, base, low]

Override with dialecticDepthLevels: an explicit array of reasoning level strings per pass.

Three Orthogonal Dialectic Knobs

KnobControlsType
dialecticCadenceHow often — minimum turns between dialectic firingsint
dialecticDepthHow many — passes per firing (1–3)int
dialecticReasoningLevelHow hard — reasoning ceiling per .chat() callstring

Input Sanitization

run_conversation strips leaked <memory-context> blocks from user input before processing. When saveMessages persists a turn that included injected context, the block can reappear in subsequent turns via message history. The sanitizer removes <memory-context> blocks plus associated system notes.

Tools

Five bidirectional tools. All accept an optional peer parameter ("user" or "ai", default "user").

ToolLLM call?Description
honcho_profileNoPeer card — key facts snapshot
honcho_searchNoSemantic search over stored context (800 tok default, 2000 max)
honcho_contextNoFull session context: summary, representation, card, messages
honcho_reasoningYesLLM-synthesized answer via dialectic .chat()
honcho_concludeNoWrite a persistent fact/conclusion about the user

Tool visibility depends on recallMode: hidden in context mode, always present in tools and hybrid.

Config Resolution

Config is read from the first file that exists:

PriorityPathScope
1$HERMES_HOME/honcho.jsonProfile-local (isolated Hermes instances)
2~/.hermes/honcho.jsonDefault profile (shared host blocks)
3~/.honcho/config.jsonGlobal (cross-app interop)

Host key is derived from the active Hermes profile: hermes (default) or hermes_<profile>.

For every key, resolution order is: host block > root > env var > default.

Full Configuration Reference

Identity & Connection

KeyTypeDefaultDescription
apiKeystringAPI key. Falls back to HONCHO_API_KEY env var
baseUrlstringBase URL for self-hosted Honcho. Local URLs auto-skip API key auth
environmentstring"production"SDK environment mapping
enabledboolautoMaster toggle. Auto-enables when apiKey or baseUrl present
workspacestringhost keyHoncho workspace ID. Shared environment — all profiles in the same workspace can see the same user identity and related memories
peerNamestringUser peer identity
aiPeerstringhost keyAI peer identity

Identity Mapping (Gateway Multi-User)

In gateway deployments (Telegram, Discord, Slack, etc.) each user arrives with a platform-native runtime ID (Telegram UID, Discord snowflake, Slack user). These three keys control how those runtime IDs map to Honcho peers. The resolver is config-driven and deterministic — no automatic merging or runtime inference.

KeyTypeDefaultDescription
pinUserPeerboolfalseWhen true, every gateway runtime user collapses to peerName. Single-operator deployments where you want all your platforms (and any other users) to share one peer. Also accepted as pinPeerName
pinPeerNameboolfalseAlias for pinUserPeer; same effect
userPeerAliasesobject{}Map of runtime IDs to peer IDs ({"86701400": "eri"}). Many-to-one is the intended pattern — alias all your runtime IDs to one peer name. One-to-many is not supported; one runtime ID resolves to exactly one peer
runtimePeerPrefixstring""Prepended to unknown runtime IDs to namespace them (e.g. "telegram_"telegram_86701400). Used only when no alias matches. Prevents collisions between platforms whose runtime IDs share the same shape

Resolver ladder (first match wins):

1. pinUserPeer / pinPeerName=true → return peerName (ignore runtime ID)
2. userPeerAliases[runtime_id]   → return aliased peer
3. userPeerAliases[runtime_id_alt] → check alt-ID too (Telegram UID + username, etc.)
4. runtimePeerPrefix + runtime_id → namespaced peer, with sha256 collision escalation
5. raw sanitized runtime_id      → fallback peer
6. peerName                      → no runtime ID at all (CLI/TUI)
7. session-key fallback          → no config either

Why no pinAiPeer? The AI peer is already pinned by construction — aiPeer is the only AI-side identity setting and the resolver never overrides it. Only the user-side peer has the runtime-vs-config tension that pinUserPeer resolves.

Host vs root semantics. All three keys are accepted at both root and hosts.<host> levels. Host-level wins. For maps and prefixes, host-level replaces the root value as a whole (not merge), so a host can intentionally own its identity universe or wipe it with userPeerAliases: {} / runtimePeerPrefix: "".

Deployment shapes (hermes memory setup honcho asks one prompt to set these):

  • Single-operatorpinUserPeer: true. All gateway users → peerName. Recommended for personal use where you connect Hermes to your own Telegram/Discord/etc.
  • Multi-user gatewaypinUserPeer: false, optional runtimePeerPrefix. Each runtime user → own peer. Recommended for bots serving many humans.
  • HybridpinUserPeer: false, userPeerAliases mapping the operator's runtime IDs to peerName. Multi-user gateway where YOU are routed but others stay distinct.

Migrating single → multi. Flipping pinUserPeer from true to false does not migrate data. Memory accumulated under peerName while pinned stays there; runtime users now resolve to fresh, empty peers. To preserve your own continuity, use the hybrid shape — alias your runtime IDs back to peerName so your turns keep landing on the pooled history while other users get their own peers. The setup wizard offers this path automatically when it detects a single → multi transition.

Memory & Recall

KeyTypeDefaultDescription
recallModestring"hybrid""hybrid" (auto-inject + tools), "context" (auto-inject only, tools hidden), "tools" (tools only, no injection). Legacy "auto""hybrid"
observationModestring"directional"Preset: "directional" (all on) or "unified" (shared pool). Use observation object for granular control
observationobjectPer-peer observation config (see Observation section)

Write Behavior

KeyTypeDefaultDescription
writeFrequencystring/int"async""async" (background), "turn" (sync per turn), "session" (batch on end), or integer N (every N turns)
saveMessagesbooltruePersist messages to Honcho API

Session Resolution

KeyTypeDefaultDescription
sessionStrategystring"per-directory""per-directory", "per-session", "per-repo" (git root), "global"
sessionPeerPrefixboolfalsePrepend peer name to session keys
sessionsobject{}Manual directory-to-session-name mappings

Session Name Resolution

The Honcho session name determines which conversation bucket memory lands in. Resolution follows a priority chain — first match wins:

PrioritySourceExample session name
1Manual map (sessions config)"myproject-main"
2/title command (mid-session rename)"refactor-auth"
3Gateway session key (Telegram, Discord, etc.)"agent-main-telegram-dm-8439114563"
4per-session strategyHermes session ID (20260415_a3f2b1)
5per-repo strategyGit root directory name (hermes-agent)
6per-directory strategyCurrent directory basename (src)
7global strategyWorkspace name (hermes)

Gateway platforms always resolve via priority 3 (per-chat isolation) regardless of sessionStrategy. The strategy setting only affects CLI sessions.

If sessionPeerPrefix is true, the peer name is prepended: eri-hermes-agent.

What each strategy produces

  • per-directory — basename of $PWD. Opening hermes in ~/code/myapp and ~/code/other gives two separate sessions. Same directory = same session across runs.
  • per-repo — git root directory name. All subdirectories within a repo share one session. Falls back to per-directory if not inside a git repo.
  • per-session — Hermes session ID (timestamp + hex). Every hermes invocation starts a fresh Honcho session. Falls back to per-directory if no session ID is available.
  • global — workspace name. One session for everything. Memory accumulates across all directories and runs.

Multi-Profile Pattern

Multiple Hermes profiles can share one workspace while maintaining separate AI identities. Config resolution is host block > root > env var > default — host blocks inherit from root, so shared settings only need to be declared once:

json
{
  "apiKey": "***",
  "workspace": "hermes",
  "peerName": "yourname",
  "hosts": {
    "hermes": {
      "aiPeer": "hermes",
      "recallMode": "hybrid",
      "sessionStrategy": "per-directory"
    },
    "hermes_coder": {
      "aiPeer": "coder",
      "recallMode": "tools",
      "sessionStrategy": "per-repo"
    }
  }
}

Both profiles see the same user (yourname) in the same shared environment (hermes), but each AI peer builds its own observations, conclusions, and behavior patterns. The coder's memory stays code-oriented; the main agent's stays broad.

Host key is derived from the active Hermes profile: hermes (default) or hermes_<profile> (e.g. hermes -p coder -> host key hermes_coder). Older hermes.<profile> host blocks are still read for compatibility and are migrated when the CLI writes profile-scoped Honcho config.

Dialectic & Reasoning

KeyTypeDefaultDescription
dialecticDepthint1Passes per dialectic cycle (1–3, clamped). 1=single query, 2=audit+synthesis, 3=audit+synthesis+reconciliation
dialecticDepthLevelsarrayOptional array of reasoning level strings per pass. Overrides proportional defaults. Example: ["minimal", "low", "medium"]
dialecticReasoningLevelstring"low"Base reasoning level for .chat(): "minimal", "low", "medium", "high", "max"
dialecticDynamicbooltrueWhen true, model can override reasoning level per-call via honcho_reasoning tool. When false, always uses dialecticReasoningLevel
dialecticMaxCharsint600Max chars of dialectic result injected into system prompt
dialecticMaxInputCharsint10000Max chars for dialectic query input to .chat(). Honcho cloud limit: 10k

Token Budgets

KeyTypeDefaultDescription
contextTokensintSDK defaultToken budget for context() API calls. Also gates prefetch truncation (tokens × 4 chars)
messageMaxCharsint25000Max chars per message sent via add_messages(). Exceeding this triggers chunking with [continued] markers. Honcho cloud limit: 25k

Cadence (Cost Control)

KeyTypeDefaultDescription
contextCadenceint1Minimum turns between base context refreshes (session summary + representation + card)
dialecticCadenceint1Minimum turns between dialectic .chat() firings
injectionFrequencystring"every-turn""every-turn" or "first-turn" (inject context on the first user message only, skip from turn 2 onward)
reasoningLevelCapstringHard cap on reasoning level: "minimal", "low", "medium", "high"

Observation (Granular)

Maps 1:1 to Honcho's per-peer SessionPeerConfig. When present, overrides observationMode preset.

json
"observation": {
  "user": { "observeMe": true, "observeOthers": true },
  "ai":   { "observeMe": true, "observeOthers": true }
}
FieldDefaultDescription
user.observeMetrueUser peer self-observation (Honcho builds user representation)
user.observeOtherstrueUser peer observes AI messages
ai.observeMetrueAI peer self-observation (Honcho builds AI representation)
ai.observeOtherstrueAI peer observes user messages (enables cross-peer dialectic)

Presets:

  • "directional" (default): all four true
  • "unified": user observeMe=true, AI observeOthers=true, rest false

Hardcoded Limits

LimitValue
Search tool max tokens2000 (hard cap), 800 (default)
Peer card fetch tokens200

Environment Variables

VariableFallback for
HONCHO_API_KEYapiKey
HONCHO_BASE_URLbaseUrl
HONCHO_ENVIRONMENTenvironment
HERMES_HONCHO_HOSTHost key override

CLI Commands

CommandDescription
hermes memory setup honchoConfigure Honcho directly — works on a fresh install
hermes honcho setupInteractive setup wizard (only registered once Honcho is the active provider; redirects to hermes memory setup)
hermes honcho statusShow resolved config for active profile
hermes honcho enable / disableToggle Honcho for active profile
hermes honcho mode <mode>Change recall or observation mode
hermes honcho peer --user <name>Update user peer name
hermes honcho peer --ai <name>Update AI peer name
hermes honcho tokens --context <N>Set context token budget
hermes honcho tokens --dialectic <N>Set dialectic max chars
hermes honcho map <name>Map current directory to a session name
hermes honcho syncCreate host blocks for all Hermes profiles

Example Config

json
{
  "apiKey": "***",
  "workspace": "hermes",
  "peerName": "username",
  "contextCadence": 2,
  "dialecticCadence": 3,
  "dialecticDepth": 2,
  "hosts": {
    "hermes": {
      "enabled": true,
      "aiPeer": "hermes",
      "recallMode": "hybrid",
      "observation": {
        "user": { "observeMe": true, "observeOthers": true },
        "ai": { "observeMe": true, "observeOthers": true }
      },
      "writeFrequency": "async",
      "sessionStrategy": "per-directory",
      "dialecticReasoningLevel": "low",
      "dialecticDepth": 2,
      "dialecticMaxChars": 600,
      "saveMessages": true
    },
    "hermes_coder": {
      "enabled": true,
      "aiPeer": "coder",
      "sessionStrategy": "per-repo",
      "dialecticDepth": 1,
      "dialecticDepthLevels": ["low"],
      "observation": {
        "user": { "observeMe": true, "observeOthers": false },
        "ai": { "observeMe": true, "observeOthers": true }
      }
    }
  },
  "sessions": {
    "/home/user/myproject": "myproject-main"
  }
}