docs/concepts/model-failover.md
OpenClaw handles failures in two stages:
agents.defaults.model.fallbacks.This doc explains the runtime rules and the data that backs them.
For a normal text run, OpenClaw evaluates candidates in this order:
<Steps> <Step title="Resolve session state"> Resolve the active session model and auth-profile preference. </Step> <Step title="Build candidate chain"> Build the model candidate chain from the current model selection and the fallback policy for that selection source. Configured defaults, cron job primaries, and auto-selected fallback models can use configured fallbacks; explicit user session selections are strict. </Step> <Step title="Try the current provider"> Try the current provider with auth-profile rotation/cooldown rules. </Step> <Step title="Advance on failover-worthy errors"> If that provider is exhausted with a failover-worthy error, move to the next model candidate. </Step> <Step title="Persist fallback override"> Persist the selected fallback override before the retry starts so other session readers see the same provider/model the runner is about to use. The persisted model override is marked `modelOverrideSource: "auto"`. </Step> <Step title="Roll back narrowly on failure"> If the fallback candidate fails, roll back only the fallback-owned session override fields when they still match that failed candidate. </Step> <Step title="Throw FallbackSummaryError if exhausted"> If every candidate fails, throw a `FallbackSummaryError` with per-attempt detail and the soonest cooldown expiry when one is known. </Step> </Steps>This is intentionally narrower than "save and restore the whole session". The reply runner only persists the model-selection fields it owns for fallback:
providerOverridemodelOverridemodelOverrideSourceauthProfileOverrideauthProfileOverrideSourceauthProfileOverrideCompactionCountThat prevents a failed fallback retry from overwriting newer unrelated session mutations such as manual /model changes or session rotation updates that happened while the attempt was running.
OpenClaw separates the selected provider/model from why it was selected. That source controls whether the fallback chain is allowed:
agents.defaults.model.primary uses agents.defaults.model.fallbacks.agents.list[].model is strict unless that agent model object includes its own fallbacks. Use fallbacks: [] to make the strict behavior explicit, or provide a non-empty list to opt that agent into model fallback.providerOverride, modelOverride, and modelOverrideSource: "auto" before retrying. That auto override can keep walking the configured fallback chain and is cleared by /new, /reset, and sessions.reset./model, the model picker, session_status(model=...), and sessions.patch write modelOverrideSource: "user". That is an exact session selection. If the selected provider/model fails before producing a reply, OpenClaw reports the failure instead of answering from an unrelated configured fallback.modelOverride without modelOverrideSource. OpenClaw treats those as user overrides so an explicit old selection is not silently converted into fallback behavior.payload.model / --model is a job primary, not a user session override. It uses configured fallbacks unless the job provides payload.fallbacks; payload.fallbacks: [] makes the cron run strict.OpenClaw uses auth profiles for both API keys and OAuth tokens.
~/.openclaw/agents/<agentId>/agent/auth-profiles.json (legacy: ~/.openclaw/agent/auth-profiles.json).~/.openclaw/agents/<agentId>/agent/auth-state.json.auth.profiles / auth.order are metadata + routing only (no secrets).~/.openclaw/credentials/oauth.json (imported into auth-profiles.json on first use).More detail: OAuth
Credential types:
type: "api_key" → { provider, key }type: "oauth" → { provider, access, refresh, expires, email? } (+ projectId/enterpriseUrl for some providers)OAuth logins create distinct profiles so multiple accounts can coexist.
provider:default when no email is available.provider:<email> (for example google-antigravity:[email protected]).Profiles live in ~/.openclaw/agents/<agentId>/agent/auth-profiles.json under profiles.
When a provider has multiple profiles, OpenClaw chooses an order like this:
<Steps> <Step title="Explicit config"> `auth.order[provider]` (if set). </Step> <Step title="Configured profiles"> `auth.profiles` filtered by provider. </Step> <Step title="Stored profiles"> Entries in `auth-profiles.json` for the provider. </Step> </Steps>If no explicit order is configured, OpenClaw uses a round‑robin order:
usageStats.lastUsed (oldest first, within each type).OpenClaw pins the chosen auth profile per session to keep provider caches warm. It does not rotate on every request. The pinned profile is reused until:
/new / /reset)Manual selection via /model …@<profileId> sets a user override for that session and is not auto-rotated until a new session starts.
If you have both an OAuth profile and an API key profile for the same provider, round‑robin can switch between them across messages unless pinned. To force a single profile:
auth.order[provider] = ["provider:profileId"], or/model … with a profile override (when supported by your UI/chat surface).When a profile fails due to auth/rate-limit errors (or a timeout that looks like rate limiting), OpenClaw marks it in cooldown and moves to the next profile.
<AccordionGroup> <Accordion title="What lands in the rate-limit / timeout bucket"> That rate-limit bucket is broader than plain `429`: it also includes provider messages such as `Too many concurrent requests`, `ThrottlingException`, `concurrency limit reached`, `workers_ai ... quota limit exceeded`, `throttled`, `resource exhausted`, and periodic usage-window limits such as `weekly/monthly limit reached`.Format/invalid-request errors (for example Cloud Code Assist tool call ID validation failures) are treated as failover-worthy and use the same cooldowns. OpenAI-compatible stop-reason errors such as `Unhandled stop reason: error`, `stop reason: error`, and `reason: error` are classified as timeout/failover signals.
Generic server text can also land in that timeout bucket when the source matches a known transient pattern. For example, the bare pi-ai stream-wrapper message `An unknown error occurred` is treated as failover-worthy for every provider because pi-ai emits it when provider streams end with `stopReason: "aborted"` or `stopReason: "error"` without specific details. JSON `api_error` payloads with transient server text such as `internal server error`, `unknown error, 520`, `upstream error`, or `backend error` are also treated as failover-worthy timeouts.
OpenRouter-specific generic upstream text such as bare `Provider returned error` is treated as timeout only when the provider context is actually OpenRouter. Generic internal fallback text such as `LLM request failed with an unknown error.` stays conservative and does not trigger failover by itself.
- OpenClaw records `cooldownModel` for rate-limit failures when the failing model id is known.
- A sibling model on the same provider can still be tried when the cooldown is scoped to a different model.
- Billing/disabled windows still block the whole profile across models.
Cooldowns use exponential backoff:
State is stored in auth-state.json under usageStats:
{
"usageStats": {
"provider:profile": {
"lastUsed": 1736160000000,
"cooldownUntil": 1736160600000,
"errorCount": 2
}
}
}
Billing/credit failures (for example "insufficient credits" / "credit balance too low") are treated as failover-worthy, but they're usually not transient. Instead of a short cooldown, OpenClaw marks the profile as disabled (with a longer backoff) and rotates to the next profile/provider.
<Note> Not every billing-shaped response is `402`, and not every HTTP `402` lands here. OpenClaw keeps explicit billing text in the billing lane even when a provider returns `401` or `403` instead, but provider-specific matchers stay scoped to the provider that owns them (for example OpenRouter `403 Key limit exceeded`).Meanwhile temporary 402 usage-window and organization/workspace spend-limit errors are classified as rate_limit when the message looks retryable (for example weekly usage limit exhausted, daily limit reached, resets tomorrow, or organization spending limit exceeded). Those stay on the short cooldown/failover path instead of the long billing-disable path.
</Note>
State is stored in auth-state.json:
{
"usageStats": {
"provider:profile": {
"disabledUntil": 1736178000000,
"disabledReason": "billing"
}
}
}
Defaults:
If all profiles for a provider fail, OpenClaw moves to the next model in agents.defaults.model.fallbacks. This applies to auth failures, rate limits, and timeouts that exhausted profile rotation (other errors do not advance fallback). Provider errors that do not expose enough detail are still labeled precisely in fallback state: empty_response means the provider returned no usable message or status, no_error_details means the provider explicitly returned Unknown error (no error details in response), and unclassified means OpenClaw preserved the raw preview but no classifier matched it yet.
Overloaded and rate-limit errors are handled more aggressively than billing cooldowns. By default, OpenClaw allows one same-provider auth-profile retry, then switches to the next configured model fallback without waiting. Provider-busy signals such as ModelNotReadyException land in that overloaded bucket. Tune this with auth.cooldowns.overloadedProfileRotations, auth.cooldowns.overloadedBackoffMs, and auth.cooldowns.rateLimitedProfileRotations.
When a run starts from the configured default primary, a cron job primary, an agent primary with explicit fallbacks, or an auto-selected fallback override, OpenClaw can walk the matching configured fallback chain. Agent primaries without explicit fallbacks and explicit user selections (for example /model ollama/qwen3.5:27b, the model picker, sessions.patch, or one-off CLI provider/model overrides) are strict: if that provider/model is unreachable or fails before producing a reply, OpenClaw reports the failure instead of answering from an unrelated fallback.
OpenClaw builds the candidate list from the currently requested provider/model plus configured fallbacks.
When every auth profile for a provider is already in cooldown, OpenClaw does not automatically skip that provider forever. It makes a per-candidate decision:
<AccordionGroup> <Accordion title="Per-candidate decisions"> - Persistent auth failures skip the whole provider immediately. - Billing disables usually skip, but the primary candidate can still be probed on a throttle so recovery is possible without restarting. - The primary candidate may be probed near cooldown expiry, with a per-provider throttle. - Same-provider fallback siblings can be attempted despite cooldown when the failure looks transient (`rate_limit`, `overloaded`, or unknown). This is especially relevant when a rate limit is model-scoped and a sibling model may still recover immediately. - Transient cooldown probes are limited to one per provider per fallback run so a single provider does not stall cross-provider fallback. </Accordion> </AccordionGroup>Session model changes are shared state. The active runner, /model command, compaction/session updates, and live-session reconciliation all read or write parts of the same session entry.
That means fallback retries have to coordinate with live model switching:
/model, session_status(model=...), and sessions.patch.agents.defaults.model.fallbacks./new, /reset, and sessions.reset clear auto-sourced overrides and return the session to the configured default./status shows the selected model and, when fallback state differs, the active fallback model and reason.This prevents the classic race:
<Steps> <Step title="Primary fails"> The selected primary model fails. </Step> <Step title="Fallback chosen in memory"> Fallback candidate is chosen in memory. </Step> <Step title="Session store still says old primary"> Session store still reflects the old primary. </Step> <Step title="Live reconciliation reads stale state"> Live-session reconciliation reads the stale session state. </Step> <Step title="Retry snapped back"> The retry gets snapped back to the old model before the fallback attempt starts. </Step> </Steps>The persisted fallback override closes that window, and the narrow rollback keeps newer manual or runtime session changes intact.
runWithModelFallback(...) records per-attempt details that feed logs and user-facing cooldown messaging:
rate_limit, overloaded, billing, auth, model_not_found, and similar failover reasons)Structured model_fallback_decision logs also include flat fallbackStep* fields when a candidate fails, is skipped, or a later fallback succeeds. These fields make the attempted transition explicit (fallbackStepFromModel, fallbackStepToModel, fallbackStepFromFailureReason, fallbackStepFromFailureDetail, fallbackStepFinalOutcome) so log and diagnostic exporters can reconstruct the primary failure even when the terminal fallback also fails.
When every candidate fails, OpenClaw throws FallbackSummaryError. The outer reply runner can use that to build a more specific message such as "all models are temporarily rate-limited" and include the soonest cooldown expiry when one is known.
That cooldown summary is model-aware:
See Gateway configuration for:
auth.profiles / auth.orderauth.cooldowns.billingBackoffHours / auth.cooldowns.billingBackoffHoursByProviderauth.cooldowns.billingMaxHours / auth.cooldowns.failureWindowHoursauth.cooldowns.overloadedProfileRotations / auth.cooldowns.overloadedBackoffMsauth.cooldowns.rateLimitedProfileRotationsagents.defaults.model.primary / agents.defaults.model.fallbacksagents.defaults.imageModel routingSee Models for the broader model selection and fallback overview.