docs/designs/2025-12-10-unified-health-status.md
Date: 2025-12-10 Status: Ready for implementation
Current issues:
Root cause: Each interface calculates status independently from raw fields, leading to drift. For example:
oauth_status and shows "Token Expired"Goals:
Non-goals:
New HealthStatus struct (in internal/contracts/types.go):
type HealthStatus struct {
Level string `json:"level"` // "healthy", "degraded", "unhealthy"
AdminState string `json:"admin_state"` // "enabled", "disabled", "quarantined"
Summary string `json:"summary"` // "Connected (5 tools)", "Token expiring in 2h"
Detail string `json:"detail"` // Optional longer explanation
Action string `json:"action"` // "login", "restart", "enable", "approve", "view_logs", ""
}
Added to existing Server struct:
type Server struct {
// ... existing fields ...
Health HealthStatus `json:"health"` // New unified health status
}
Level values:
| Level | Meaning | View convention |
|---|---|---|
healthy | Ready to use, no issues | green |
degraded | Works but needs attention soon | yellow |
unhealthy | Broken, can't use until fixed | red |
Action types:
| Action | Meaning |
|---|---|
"" | No action needed (healthy state) |
login | OAuth authentication required |
restart | Server needs restart |
enable | Server is disabled |
approve | Server is quarantined |
view_logs | Check logs for details |
Location: internal/runtime/runtime.go in GetAllServers() (or extracted to internal/health/calculator.go)
Priority order (first match wins):
1. Admin state checks (shown instead of health when not enabled)
- quarantined → AdminState: "quarantined"
- disabled → AdminState: "disabled"
2. Unhealthy (red) conditions
- connection refused/failed → "unhealthy", Action: "restart"
- auth failed (bad credentials) → "unhealthy", Action: "login"
- server crashed → "unhealthy", Action: "restart"
- config error → "unhealthy", Action: "view_logs"
- token expired → "unhealthy", Action: "login"
- refresh failed (after retries)→ "unhealthy", Action: "login"
- user logged out → "unhealthy", Action: "login"
3. Degraded (yellow) conditions
- token expiring soon, no refresh token → "degraded", Action: "login"
- connecting (in progress) → "degraded", Action: ""
4. Healthy (green)
- connected + authenticated (OAuth servers)
- connected (non-OAuth servers)
- token valid OR auto-refresh working
OAuth-specific logic:
| Condition | Level | Action |
|---|---|---|
| Token valid OR auto-refresh working | healthy | - |
| Token expiring soon, no refresh token | degraded | login |
| Token expired | unhealthy | login |
| Refresh failed (after retries) | unhealthy | login |
| User logged out | unhealthy | login |
Key distinction:
Each interface renders HealthStatus consistently but adapted to its medium.
mcpproxy upstream list and mcpproxy auth status:
Server Health Action
───────────────────────────────────────────────────────────────────
slack 🟢 Connected (5 tools)
github 🟡 Token expiring in 45m → auth login --server=github
filesystem 🔴 Connection refused → upstream restart filesystem
new-server ⏸️ Quarantined → Approve in Web UI
old-server ⏹️ Disabled → upstream enable old-server
🟢 slack
🟡 github - Token expiring
🔴 filesystem - Error
⏸️ new-server (Quarantined)
⏹️ old-server (Disabled)
Clicking yellow/red servers opens Web UI to the relevant fix page.
| Location | Shows | Actions |
|---|---|---|
| Dashboard | "X servers need attention" banner | Quick-fix buttons per server |
| ServerCard | Colored status badge + summary | Login/Restart/Reconnect based on action field |
| ServerDetail | Full health details | Same actions + logs |
Each interface maps the Action field to its own UX:
CLI:
"login" → "auth login --server=%s"
"restart" → "upstream restart %s"
"enable" → "upstream enable %s"
"approve" → "Approve in Web UI or config"
"view_logs"→ "upstream logs %s"
Tray:
"login" → opens http://localhost:8080/ui/servers/{name}?action=login
"restart" → triggers API call directly
"enable" → triggers API call directly
"approve" → opens http://localhost:8080/ui/servers/{name}?action=approve
Web UI:
"login" → Login button
"restart" → Restart button
"enable" → Enable toggle
"approve" → Approve button
Files to modify:
| File | Change |
|---|---|
internal/contracts/types.go | Add HealthStatus struct |
internal/runtime/runtime.go | Calculate Health in GetAllServers() |
internal/httpapi/server.go | Ensure health field is included in API response |
cmd/mcpproxy/upstream_cmd.go | Update upstream list to use Health field |
cmd/mcpproxy/auth_cmd.go | Update auth status to use Health field |
internal/tray/managers.go | Update getServerStatusDisplay() to use Health field |
frontend/src/components/ServerCard.vue | Use health for badge color + show action |
frontend/src/views/Dashboard.vue | Use health.level to filter servers needing attention |
No backward compatibility needed - all clients (CLI, tray, web) ship together in mcpproxy releases.
┌─────────────────────────────────────────────────────────────┐
│ Backend (Runtime) │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ CalculateHealth() → HealthStatus ││
│ │ - Level: healthy/degraded/unhealthy ││
│ │ - AdminState: enabled/disabled/quarantined ││
│ │ - Summary: "Connected (5 tools)" ││
│ │ - Action: login/restart/enable/approve/"" ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
│
GET /api/v1/servers
│
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ CLI │ │ Tray │ │ Web UI │
│ │ │ │ │ │
│ 🟢/🟡/🔴 │ │ 🟢/🟡/🔴 │ │ badges │
│ + hint │ │ + click │ │ + btns │
└─────────┘ └─────────┘ └─────────┘
Key principle: Backend owns health calculation. Interfaces only render.