docs/user-manual/en/4-proxy/4.3-failover.md
The failover feature automatically switches to a backup provider when the primary provider's request fails, ensuring uninterrupted service.
Applicable scenarios:
Using the failover feature requires:
Settings > Advanced > Failover
Three tabs at the top of the page:
Select the application to configure.
Drag providers to adjust their order:
Click the "Remove" button to the right of the provider.
When both proxy and failover are enabled, provider cards display a failover toggle.
| State | Behavior |
|---|---|
| Off | Only records failures, no automatic switching |
| On | Automatically switches to the next provider on failure |
graph TD
Start[Request arrives at proxy] --> Send[Send to current provider]
Send --> CheckSuccess{Success?}
CheckSuccess -- Yes --> Return[Return response]
CheckSuccess -- No --> LogFail[Record failure]
LogFail --> CheckCircuit{Check circuit breaker}
CheckCircuit -- Tripped --> Skip[Skip this provider]
CheckCircuit -- Not tripped --> IncFail[Increment failure count]
Skip --> Next{Next in queue?}
IncFail --> Next
Next -- Yes --> Switch[Switch provider]
Switch --> Retry[Retry request]
Retry --> Send
Next -- No --> Error[Return error]
The circuit breaker prevents frequent retries against failing providers.
Different apps have independent default configurations. Below are general defaults; Claude has its own relaxed configuration.
| Setting | Description | General Default | Claude Default | Range |
|---|---|---|---|---|
| Failure Threshold | Consecutive failures to trigger circuit breaker | 4 | 8 | 1-20 |
| Recovery Success Threshold | Successes needed in half-open state to close breaker | 2 | 3 | 1-10 |
| Recovery Wait Time | Time before attempting recovery after tripping (seconds) | 60 | 90 | 0-300 |
| Error Rate Threshold | Error rate that opens the circuit breaker | 60% | 70% | 0-100% |
| Minimum Requests | Minimum requests before calculating error rate | 10 | 15 | 5-100 |
Claude has more relaxed default settings due to longer request times, tolerating more failures.
| Setting | Description | General Default | Claude Default | Range |
|---|---|---|---|---|
| Stream First Byte Timeout | Max wait time for first data chunk (seconds) | 60 | 90 | 1-120 |
| Stream Idle Timeout | Max interval between data chunks (seconds) | 120 | 180 | 60-600 (0 to disable) |
| Non-stream Timeout | Total timeout for non-streaming requests (seconds) | 600 | 600 | 60-1200 |
| Setting | Description | General Default | Claude Default | Range |
|---|---|---|---|---|
| Max Retries | Number of retries on request failure | 3 | 6 | 0-10 |
Gemini's default max retries is 5.
| State | Description |
|---|---|
| Closed | Normal state, requests allowed |
| Open | Circuit broken, this provider is skipped |
| Half-Open | Attempting recovery, sending probe requests |
stateDiagram-v2
[*] --> Closed: Initialize
Closed --> Open: Failures >= threshold
Open --> HalfOpen: Recovery wait time expires
HalfOpen --> Closed: Probe successes >= recovery threshold
HalfOpen --> Open: Probe failed
Cards display health status badges:
| Badge | Status | Description |
|---|---|---|
| Green | Healthy | 0 consecutive failures |
| Yellow | Warning | Has failures but circuit not tripped |
| Red | Circuit Broken | Circuit breaker tripped, temporarily skipped |
The failover queue also displays each provider's health status.
Each failover event records:
| Information | Description |
|---|---|
| Time | When it occurred |
| Original Provider | The provider that failed |
| New Provider | The provider switched to |
| Failure Reason | Error message |
Viewable in the request logs within usage statistics.
| Scenario | Failure Threshold | Recovery Wait |
|---|---|---|
| High availability requirement | 2 | 30 seconds |
| General scenario | 3 | 60 seconds |
| Tolerant of occasional failures | 5 | 120 seconds |
Periodically check:
Check:
Possible causes:
Solutions:
Wait for the recovery wait time to expire for automatic recovery, or: