experimental/sgl-router/monitoring/README.md
Grafana dashboard for the experimental router's Prometheus metrics, exposed
on /metrics (text/plain, version 0.0.4) on the router's serving port
(default 30000).
grafana-dashboard.json — importable Grafana dashboard, SGLang Router
(experimental) (uid sgl-router-experimental).The dashboard graphs every family the router emits:
| Metric | Type | What it shows |
|---|---|---|
sgl_router_requests_total | Counter | Dispatches by worker_url, model_id, mode, outcome |
sgl_router_request_duration_seconds | Histogram | End-to-end request latency by model_id |
sgl_router_ttft_seconds | Histogram | Time to first token (streaming) by model_id |
sgl_router_responses_total | Counter | Client-visible HTTP status_code |
sgl_router_overlap_blocks | Histogram | Cache-aware-zmq overlap blocks by model_id |
sgl_router_active_load | Gauge | Per-worker prefill-token / decode-block load |
sgl_router_workers | Gauge | Registered worker count by mode |
sgl_router_worker_health | Gauge | Per-worker health (1=breaker admits, 0=open) |
sgl_router_worker_cb_state | Gauge | Per-worker circuit breaker state (0=closed, 1=open, 2=half_open) |
sgl_router_worker_inflight_requests | Gauge | In-flight requests per worker |
sgl_router_stale_requests_total | Counter | Stale-request cancellations |
sgl_router_decode_affinity_total | Counter | PD decode-affinity outcomes |
sgl_router_sticky_total | Counter | Sticky-session selection outcomes |
The sgl_router_workers / sgl_router_worker_* gauges are sampled from the
live worker registry on every scrape, so a removed worker stops emitting
series immediately rather than leaving a stale value.
Point Prometheus at the router's /metrics endpoint:
scrape_configs:
- job_name: sgl-router
metrics_path: /metrics
static_configs:
- targets:
- '127.0.0.1:30000' # router host:port
grafana-dashboard.json (or paste its contents).Datasource
variable. The dashboard uses a templated data source, so it imports into
any Grafana without editing the JSON.The top bar exposes model_id and worker_url template variables (both
default to All) to scope the panels.
The JSON is generated programmatically to keep the ~20 panels consistent. If the metric surface changes, update the generator and overwrite the JSON rather than hand-editing — hand-edits drift from the panel conventions.