website/docs/developer-guide/provider-runtime.md
Hermes has a shared provider runtime resolver used across:
Primary implementation:
hermes_cli/runtime_provider.py — credential resolution, _resolve_custom_runtime()hermes_cli/auth.py — provider registry, resolve_provider()hermes_cli/model_switch.py — shared /model switch pipeline (CLI + gateway)agent/auxiliary_client.py — auxiliary model routingproviders/ — ABC + registry entry points (ProviderProfile, register_provider, get_provider_profile, list_providers)plugins/model-providers/<name>/ — per-provider plugins (bundled) that declare api_mode, base_url, env_vars, fallback_models and register themselves into the registry on first access. User plugins at $HERMES_HOME/plugins/model-providers/<name>/ override bundled ones of the same name.get_provider_profile() in providers/ returns a ProviderProfile for a given provider id. runtime_provider.py calls this at resolution time to get the canonical base_url, env_vars priority list, api_mode, and fallback_models without needing to duplicate that data in multiple files. Adding a new plugin under plugins/model-providers/<your-provider>/ (or $HERMES_HOME/plugins/model-providers/<your-provider>/) that calls register_provider() is enough for runtime_provider.py to pick it up — no branch needed in the resolver itself.
If you are trying to add a new first-class inference provider, read Adding Providers and the Model Provider Plugin guide alongside this page.
At a high level, provider resolution uses:
config.yaml model/provider configThat ordering matters because Hermes treats the saved model/provider choice as the source of truth for normal runs. This prevents a stale shell export from silently overriding the endpoint a user last selected in hermes model.
Current provider families include (see plugins/model-providers/ for the complete bundled set):
gemini, google-gemini-cli)alibaba, alibaba-coding-plan)kimi-coding, kimi-coding-cn)minimax, minimax-cn, minimax-oauth)provider: custom) — first-class provider for any OpenAI-compatible endpointcustom_providers list in config.yaml)The runtime resolver returns data such as:
providerapi_modebase_urlapi_keysourceThis resolver is the main reason Hermes can share auth/runtime logic between:
hermes chatHermes contains logic to avoid leaking the wrong API key to a custom endpoint when multiple provider keys exist (e.g. OPENROUTER_API_KEY and OPENAI_API_KEY).
Each provider's API key is scoped to its own base URL:
OPENROUTER_API_KEY is only sent to openrouter.ai endpointsOPENAI_API_KEY is used for custom endpoints and as a fallbackHermes also distinguishes between:
That distinction is especially important for:
OPENAI_BASE_URL is not exported in the current shellAnthropic is not just "via OpenRouter" anymore.
When provider resolution selects anthropic, Hermes uses:
api_mode = anthropic_messagesagent/anthropic_adapter.py for translationCredential resolution for native Anthropic now prefers refreshable Claude Code credentials over copied env tokens when both are present. In practice that means:
ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN values still work as explicit overridesCodex uses a separate Responses API path:
api_mode = codex_responsesAuxiliary tasks such as:
can use their own provider/model routing rather than the main conversational model.
When an auxiliary task is configured with provider main, Hermes resolves that through the same shared runtime path as normal chat. In practice that means:
hermes model / config.yaml also workHermes supports a configured fallback provider chain — a list of (provider, model) entries tried in order when the primary model encounters errors. The legacy single-pair fallback_model dict is still accepted for back-compat (and migrated on first write).
Storage: AIAgent.__init__ stores the fallback_model dict and sets _fallback_activated = False.
Trigger points: _try_activate_fallback() is called from three places in the main retry loop in run_agent.py:
Activation flow (_try_activate_fallback):
False immediately if already activated or not configuredresolve_provider_client() from auxiliary_client.py to build a new client with proper authapi_mode: codex_responses for openai-codex, anthropic_messages for anthropic, chat_completions for everything elseself.model, self.provider, self.base_url, self.api_mode, self.client, self._client_kwargs_fallback_activated = True — prevents firing againConfig flow:
cli.py reads CLI_CONFIG["fallback_model"] → passes to AIAgent(fallback_model=...)gateway/run.py._load_fallback_model() reads config.yaml → passes to AIAgentprovider and model keys must be non-empty, or fallback is disabledtools/delegate_tool.py): subagents inherit the parent's provider but not the fallback configCron jobs do support fallback: run_job() reads fallback_providers (or legacy fallback_model) from config.yaml and passes it to AIAgent(fallback_model=...), matching the gateway's _load_fallback_model() pattern. See Cron Internals.
Fallback behavior is exercised across several suites:
tests/run_agent/test_fallback_credential_isolation.py — credential isolation between primary and fallbacktests/hermes_cli/test_fallback_cmd.py — the /fallback CLI commandtests/gateway/test_fallback_eviction.py — gateway eviction of failed providers