Back to Cc Switch

4.3 Failover

docs/user-manual/en/4-proxy/4.3-failover.md

3.14.16.4 KB
Original Source

4.3 Failover

Overview

The failover feature automatically switches to a backup provider when the primary provider's request fails, ensuring uninterrupted service.

Applicable scenarios:

  • Unstable provider services
  • High availability requirements
  • Long-running tasks

Prerequisites

Using the failover feature requires:

  1. Proxy service started
  2. App takeover enabled
  3. Failover queue configured
  4. Auto failover enabled

Configure the Failover Queue

Open Configuration Page

Settings > Advanced > Failover

Select Application

Three tabs at the top of the page:

  • Claude
  • Codex
  • Gemini

Select the application to configure.

Add Backup Providers

  1. In the "Failover Queue" area
  2. Click "Add Provider"
  3. Select a provider from the dropdown list
  4. The provider is added to the end of the queue

Adjust Priority

Drag providers to adjust their order:

  • Lower numbers mean higher priority
  • After the primary provider fails, backup providers are tried in order

Remove Provider

Click the "Remove" button to the right of the provider.

Main Interface Quick Actions

When both proxy and failover are enabled, provider cards display a failover toggle.

Add to Queue

  1. Find the provider card
  2. Enable the failover toggle
  3. The provider is automatically added to the queue

Remove from Queue

  1. Disable the failover toggle on the provider card
  2. The provider is removed from the queue

Enable Auto Failover

Steps

  1. On the failover configuration page
  2. Enable the "Auto Failover" toggle

Toggle Description

StateBehavior
OffOnly records failures, no automatic switching
OnAutomatically switches to the next provider on failure

Failover Flow

mermaid
graph TD
    Start[Request arrives at proxy] --> Send[Send to current provider]
    Send --> CheckSuccess{Success?}
    CheckSuccess -- Yes --> Return[Return response]
    CheckSuccess -- No --> LogFail[Record failure]
    LogFail --> CheckCircuit{Check circuit breaker}
    CheckCircuit -- Tripped --> Skip[Skip this provider]
    CheckCircuit -- Not tripped --> IncFail[Increment failure count]
    Skip --> Next{Next in queue?}
    IncFail --> Next
    Next -- Yes --> Switch[Switch provider]
    Switch --> Retry[Retry request]
    Retry --> Send
    Next -- No --> Error[Return error]

Circuit Breaker Configuration

The circuit breaker prevents frequent retries against failing providers.

Configuration Items

Different apps have independent default configurations. Below are general defaults; Claude has its own relaxed configuration.

SettingDescriptionGeneral DefaultClaude DefaultRange
Failure ThresholdConsecutive failures to trigger circuit breaker481-20
Recovery Success ThresholdSuccesses needed in half-open state to close breaker231-10
Recovery Wait TimeTime before attempting recovery after tripping (seconds)60900-300
Error Rate ThresholdError rate that opens the circuit breaker60%70%0-100%
Minimum RequestsMinimum requests before calculating error rate10155-100

Claude has more relaxed default settings due to longer request times, tolerating more failures.

Timeout Configuration

SettingDescriptionGeneral DefaultClaude DefaultRange
Stream First Byte TimeoutMax wait time for first data chunk (seconds)60901-120
Stream Idle TimeoutMax interval between data chunks (seconds)12018060-600 (0 to disable)
Non-stream TimeoutTotal timeout for non-streaming requests (seconds)60060060-1200

Retry Configuration

SettingDescriptionGeneral DefaultClaude DefaultRange
Max RetriesNumber of retries on request failure360-10

Gemini's default max retries is 5.

Circuit Breaker States

StateDescription
ClosedNormal state, requests allowed
OpenCircuit broken, this provider is skipped
Half-OpenAttempting recovery, sending probe requests

State Transitions

mermaid
stateDiagram-v2
    [*] --> Closed: Initialize
    Closed --> Open: Failures >= threshold
    Open --> HalfOpen: Recovery wait time expires
    HalfOpen --> Closed: Probe successes >= recovery threshold
    HalfOpen --> Open: Probe failed

Health Status Indicators

Provider Cards

Cards display health status badges:

BadgeStatusDescription
GreenHealthy0 consecutive failures
YellowWarningHas failures but circuit not tripped
RedCircuit BrokenCircuit breaker tripped, temporarily skipped

Queue List

The failover queue also displays each provider's health status.

Failover Logs

Each failover event records:

InformationDescription
TimeWhen it occurred
Original ProviderThe provider that failed
New ProviderThe provider switched to
Failure ReasonError message

Viewable in the request logs within usage statistics.

Best Practices

Queue Configuration Recommendations

  1. Primary provider: The most stable and fastest provider
  2. First backup: Second-best choice
  3. Second backup: Last resort

Circuit Breaker Configuration Recommendations

ScenarioFailure ThresholdRecovery Wait
High availability requirement230 seconds
General scenario360 seconds
Tolerant of occasional failures5120 seconds

Monitoring Recommendations

Periodically check:

  • Health status of each provider
  • Failover frequency
  • Circuit breaker trigger frequency

FAQ

Failover Not Triggering

Check:

  1. Is the proxy service running
  2. Is app takeover enabled
  3. Is auto failover enabled
  4. Are there backup providers in the queue

Failover Triggering Too Frequently

Possible causes:

  • Unstable primary provider
  • Network issues
  • Configuration errors

Solutions:

  • Check primary provider status
  • Adjust circuit breaker parameters
  • Consider changing the primary provider

All Providers Circuit-Broken

Wait for the recovery wait time to expire for automatic recovery, or:

  1. Manually restart the proxy service
  2. Reset circuit breaker states