v2/examples/litellm/README.md
A production-ready, multi-tenant LiteLLM proxy that enables Claude Code to seamlessly route requests across multiple LLM providers (OpenAI, Azure, OpenRouter, Bedrock, Ollama, etc.) with enterprise features.
# Clone the repository
git clone https://github.com/ruvnet/claude-flow.git
cd claude-flow/examples/litellm
# Copy environment template
cp .env.example .env
# Edit .env with your API keys
nano .env
# Deploy the stack
./scripts/deploy.sh start
# Set environment variables
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_AUTH_TOKEN=<your-litellm-master-key>
# Use Claude Code with different models
claude --model codex-mini "Write a Python function"
claude --model o3-pro "Explain quantum computing"
claude --model deepseek-coder "Refactor this code"
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Claude Code ββββββΆβ Load BalancerββββββΆβ LiteLLM β
βββββββββββββββ ββββββββββββββββ β Proxies β
βββββββ¬ββββββββ
β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββ
β β β
βββββββΌββββββ ββββββββββΌβββββββββ ββββββββββΌβββββββββ
β OpenAI β β OpenRouter β β Azure β
βββββββββββββ ββββββββββββββββββ ββββββββββββββββββ
Edit config/config.yaml to customize model routing:
model_list:
- model_name: "fast-code"
litellm_params:
model: "openrouter/qwen/qwen-3-coder"
max_tokens: 8192
temperature: 0.2
- model_name: "reasoning"
litellm_params:
model: "openai/o3-pro"
max_tokens: 4096
temperature: 0.7
Create and manage tenants:
# Create a new tenant
./scripts/manage-tenants.sh create engineering
# List all tenants
./scripts/manage-tenants.sh list
# Update tenant budget
./scripts/manage-tenants.sh update engineering budget 200
# View usage statistics
./scripts/manage-tenants.sh usage engineering
Configure automatic fallbacks in config/config.yaml:
fallback_models:
code_chain:
- codex-mini # Primary (fast, cheap)
- deepseek-coder # Secondary
- local-codellama # Tertiary (free, local)
# Start services
./scripts/deploy.sh start
# Stop services
./scripts/deploy.sh stop
# View logs
./scripts/deploy.sh logs
# Check status
./scripts/deploy.sh status
# Clean everything
./scripts/deploy.sh clean
Adjust replica count in docker-compose.yml:
services:
litellm:
deploy:
replicas: 5 # Increase for higher load
# Backup database
docker-compose exec postgres pg_dump -U litellm > backup.sql
# Restore database
docker-compose exec -T postgres psql -U litellm < backup.sql
# Generate new master key
export NEW_KEY=$(openssl rand -hex 32)
echo "LITELLM_MASTER_KEY=sk-litellm-$NEW_KEY" >> .env
# Restart services
docker-compose restart
# Fast code generation
claude --model codex-mini "Write a REST API endpoint"
# Complex reasoning
claude --model o3-pro "Design a distributed system"
# Local processing (no cloud)
claude --model local-codellama "Refactor this function"
# Use cheapest model for simple tasks
export ANTHROPIC_MODEL=claude-3-haiku
claude "Format this JSON"
# Use powerful model for complex tasks
export ANTHROPIC_MODEL=o3-pro
claude "Solve this algorithm problem"
# Engineering team
export ANTHROPIC_AUTH_TOKEN=$TENANT_ENGINEERING_KEY
claude --model codex-mini "Build feature"
# Research team
export ANTHROPIC_AUTH_TOKEN=$TENANT_RESEARCH_KEY
claude --model o3-pro "Analyze dataset"
Connection Refused
# Check if services are running
docker-compose ps
# Check LiteLLM logs
docker-compose logs litellm-1
Authentication Failed
# Verify master key in .env
grep LITELLM_MASTER_KEY .env
# Test connection
curl http://localhost:4000/health
Model Not Found
# Check available models
curl -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
http://localhost:4000/models
Enable debug logging:
# In docker-compose.yml
environment:
- LITELLM_LOG_LEVEL=DEBUG
- DEBUG=true
Add custom providers in config/config.yaml:
model_list:
- model_name: "custom-model"
litellm_params:
model: "custom_provider/model_name"
api_base: "https://api.custom.com/v1"
api_key: ${CUSTOM_API_KEY}
custom_llm_provider: "custom"
# Optimize for high throughput
router_settings:
max_parallel_requests: 100
request_timeout: 30
enable_caching: true
cache_ttl: 3600
# Implement cost controls
cost_tracking:
alert_thresholds:
- threshold: 80
action: notify
- threshold: 100
action: block
See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
Built with β€οΈ by the Claude Flow team