docs/book/src/getting-started/multi-model-setup.md
A walkthrough of the common patterns for using multiple model providers: cost optimisation, quality tiers, local-first with hosted fallback, API key rotation, and rate-limit resilience.
Reference material for the provider system lives in:
- Model Providers → Overview — what providers are, configuration shape
- Model Providers → Fallback & routing —
reliableandroutermeta-providers- Model Providers → Catalog — every provider's config shape
Multi-model configuration is useful for:
429 (rate limit) responsesWhen a provider experiences a transient error (timeout, connection failure, auth issue), ZeroClaw automatically attempts fallback providers in the order specified.
Example: If your primary provider is openai but it's temporarily unavailable, ZeroClaw can automatically fall back to anthropic, then groq.
[reliability]
fallback_providers = ["anthropic", "groq", "openrouter"]
When the primary provider recovers, ZeroClaw resumes using it (no sticky failover).
Some models may not be available in all regions, or you might want to use a faster model when a heavy model is rate-limited.
[reliability]
model_fallbacks = { "claude-opus-4-7" = ["claude-sonnet-4-6", "gpt-4o"] }
If claude-opus-4-7 fails or is unavailable, ZeroClaw tries the fallback models in order while staying within the same provider (unless a provider-level fallback is also configured).
For providers that frequently encounter rate limits, you can supply additional API keys that ZeroClaw will rotate through on 429 responses.
[reliability]
api_keys = ["sk-key-2", "sk-key-3", "sk-key-4"]
The primary api_key (configured globally or per-channel) is always tried first; these extras are rotated on rate-limit errors.
Each provider attempt includes configurable retries with exponential backoff before moving to the next fallback.
[reliability]
provider_retries = 2 # Retry count per provider
provider_backoff_ms = 500 # Initial backoff in milliseconds
All multi-model behavior lives under the [reliability] section of config.toml. See the Config reference for the full field index and defaults.
Set up a simple fallback from your primary provider to a backup:
default_provider = "openai"
default_model = "gpt-4o"
[reliability]
fallback_providers = ["anthropic"]
Behavior: If OpenAI times out or returns an error, ZeroClaw will retry twice with exponential backoff, then attempt the same request using Anthropic.
Combine provider fallbacks with model fallbacks and API key rotation:
default_provider = "openai"
default_model = "gpt-4o"
api_key = "sk-openai-primary"
[reliability]
fallback_providers = ["anthropic", "groq", "openrouter"]
api_keys = ["sk-openai-backup-1", "sk-openai-backup-2"]
[reliability.model_fallbacks]
"gpt-4o" = ["gpt-4-turbo", "gpt-3.5-turbo"]
"gpt-4-turbo" = ["gpt-3.5-turbo"]
Behavior:
gpt-4o with primary key (2 retries)Use a local Ollama instance as primary, fall back to cloud provider:
default_provider = "ollama"
default_model = "llama2:70b"
api_url = "http://localhost:11434"
[reliability]
fallback_providers = ["openrouter", "groq"]
Behavior: If Ollama goes down or times out, automatically use OpenRouter or Groq instead without configuration changes.
Use an expensive reasoning model for complex tasks, but fall back to a faster model:
default_provider = "anthropic"
default_model = "claude-opus-4-7"
[reliability]
model_fallbacks = { "claude-opus-4-7" = ["claude-sonnet-4-6"] }
Behavior: When Opus is rate-limited or slow, automatically use Sonnet (typically 2–3x faster and cheaper).
For organizations with multi-region deployments:
# Primary US region
default_provider = "anthropic"
default_model = "claude-sonnet-4-6"
[reliability]
# Fall back to EU region provider if US Anthropic is down
fallback_providers = ["bedrock"] # AWS Bedrock in multiple regions
provider_retries = 3
provider_backoff_ms = 1000
Ensure each fallback provider has credentials in your environment:
export ANTHROPIC_API_KEY="..."
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
The [reliability] section is hot-reloadable. While a channel or gateway is running, updates to config.toml take effect on the next inbound message without requiring a restart.
Fallback is triggered by:
Fallback is not triggered by:
Enable runtime traces to debug fallback behavior:
[observability]
runtime_trace_mode = "rolling"
runtime_trace_path = "state/runtime-trace.jsonl"
Then query traces:
# Show all fallback events
zeroclaw doctor traces --contains "fallback"
# Show provider retry details
zeroclaw doctor traces --contains "provider"
# Show rate-limit rotation
zeroclaw doctor traces --contains "429"
fallback_providersEach fallback provider resolves credentials independently using the standard resolution order:
ZEROCLAW_API_KEY, then API_KEYImportant: The primary provider's API key is not automatically reused by fallback providers. Set credentials for each provider separately.
Example:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="claude-..."
export GROQ_API_KEY="gsk-..."