scripts/adaptive_router_demo/README.md
A 5-minute demo of LiteLLM's adaptive router learning, in real time, that the smart model wins for code while the fast model is fine for facts.
┌─ traffic.py ──┐ ┌─ litellm proxy ──────────┐ ┌─ dashboard.html ─┐
│ synthetic │──▶│ adaptive_router strategy │──▶│ bandit bars + │
│ chat sessions │ │ /adaptive_router/state │ │ cost meter + │
└───────────────┘ └──────────┬───────────────┘ │ activity log │
│ └───────────────────┘
┌─────────▼───────────┐
│ chat.html │
│ interactive chat │
│ with preset │
│ scenarios │
└─────────────────────┘
| File | What it does |
|---|---|
dashboard.html | Live bandit dashboard — polls /adaptive_router/state every 500ms |
chat.html | Interactive chat with preset scenarios — sends real requests through the router |
traffic.py | Synthetic traffic generator — drives labeled sessions for automated demo |
(request_type, model)
cell. Bars fill up as α grows from positive feedback signals.The repo ships with a working example config:
export OPENAI_API_KEY=sk-... # underlying models hit OpenAI
uv run litellm \
--config litellm/proxy/example_config_yaml/adaptive_router_example.yaml \
--port 4000
DATABASE_URL is optional — the proxy falls back to a bundled Neon dev DB.
Wait ~15s until you see Application startup complete.
Open chat.html in a browser (same file:// or python3 -m http.server approach as the dashboard):
x-litellm-adaptive-router-model and x-litellm-request-type response headers).Note on headers: The model/type headers are only readable in the browser if the proxy sets
Access-Control-Expose-Headers. LiteLLM defaults to exposing them. If the info panel showscheck dashboard, the router still works — you can verify picks indashboard.html.
The dashboard is a single static HTML file. Either:
Easy: double-click dashboard.html. Most browsers will load it from
file:// and the LiteLLM proxy's CORS defaults (*) will accept it.
If your browser blocks file:// fetches:
cd scripts/adaptive_router_demo
python3 -m http.server 8080
Then open http://localhost:8080/dashboard.html.
In the connect bar, fill in:
http://localhost:4000master_key from your config (sk-1234 in the example).Click Connect. The dashboard polls GET /adaptive_router/state every
500ms (admin-only endpoint, returns one snapshot per configured router).
In a second terminal:
uv run python scripts/adaptive_router_demo/traffic.py \
--proxy-url http://localhost:4000 \
--api-key sk-1234 \
--router smart-cheap-router \
--rounds 100 \
--rate 0.5
What it does:
(request_type, prompt) per round from a small labeled corpus.SIGNAL_GATE_MIN_MESSAGES=4 gate
in one round-trip) so the post-call hook runs and updates the bandit.x-litellm-adaptive-router-model response header to see what
the router picked.code_generation : smart=0.92 fast=0.35
factual_lookup : smart=0.90 fast=0.85
writing : smart=0.85 fast=0.55
After 50–80 rounds you'll see code_generation decisively favor smart
while factual_lookup stays near a coin flip — the router learned the
asymmetry from the oracle.
| Knob | Where | What changes |
|---|---|---|
| Quality vs. cost weight | adaptive_router_config.weights in proxy yaml | Bias toward quality or savings |
| Per-cell cold-start mass | litellm/router_strategy/adaptive_router/config.py COLD_START_MASS | How long until the prior is overwritten |
| Avg tokens per request | dashboard input box | How the cost meter estimates spend |
| Oracle | traffic.py ORACLE dict | Which model "should" win for which type |
| Sessions to drive | --rounds | Total learning budget |
| Throttle | --rate | Seconds between sessions |
If your proxy has more than one auto_router/adaptive_router deployment,
the dashboard shows a router dropdown above the bars. Each router is
independent; the cost meter is per-router (and resets when you switch).
proxy_admin. The state endpoint is
admin-only. Use the master key./adaptive_router/state — proxy started, but no
auto_router/adaptive_router deployment is in the model list.record_turn activity.
Common cause: requests are not including 4+ messages, so the signal
gate skips them. traffic.py already builds 5-message conversations,
so this only happens if you've changed the script.input_cost_per_token set in litellm_params. Add it.LITELLM_CORS_ORIGINS=*
on the proxy (the default), or serve dashboard.html from
python3 -m http.server instead of file://.