docs/docker-recovery-phase3.md
This PR implements the remaining enhancements from the Docker recovery improvement plan. These are non-critical features that improve observability, user experience, and operational excellence.
Previous PR: #120 (Critical fixes - MERGED) This PR: Phase 3 enhancements
Priority: Medium | Effort: 4h
Problem: When mcpproxy restarts, Docker recovery state is lost. If Docker is still down, the system doesn't remember it was in recovery mode.
Solution:
Files to modify:
internal/storage/manager.go - Add recovery state schemacmd/mcpproxy-tray/internal/state/machine.go - Load/save statecmd/mcpproxy-tray/main.go - Resume recovery on startupImplementation:
// storage schema
type DockerRecoveryState struct {
LastAttempt time.Time
FailureCount int
DockerAvailable bool
RecoveryMode bool
}
// On startup
if state := loadDockerRecoveryState(); state.RecoveryMode {
resumeDockerRecovery(state)
}
Testing:
Priority: Medium | Effort: 2h
Problem: Users aren't clearly notified about Docker recovery progress and outcomes.
Solution:
Files to modify:
cmd/mcpproxy-tray/main.go - Add notification callsinternal/tray/notifications.go - Add recovery notificationsImplementation:
// When Docker recovery starts
showNotification("Docker Recovery", "Docker engine detected offline. Reconnecting servers...")
// When recovery succeeds
showNotification("Recovery Complete", "Successfully reconnected X servers")
// When recovery fails
showNotification("Recovery Failed", "Unable to reconnect servers. Check Docker status.")
Testing:
Priority: Low | Effort: 3h
Problem: Docker health check intervals and timeouts are hardcoded. Power users may want to customize these.
Solution:
Files to modify:
internal/config/config.go - Add docker_recovery sectioncmd/mcpproxy-tray/main.go - Use config valuesdocs/configuration.md - Document optionsConfiguration schema:
{
"docker_recovery": {
"enabled": true,
"health_check": {
"initial_delay": "2s",
"intervals": ["2s", "5s", "10s", "30s", "60s"],
"max_retries": 10,
"timeout": "5s"
}
}
}
Testing:
Priority: Low | Effort: 6h
Problem: No visibility into Docker recovery statistics and success rates over time.
Solution:
/api/v1/metrics endpointMetrics to track:
Files to modify:
internal/storage/manager.go - Add metrics schemainternal/httpapi/server.go - Add /api/v1/metrics endpointcmd/mcpproxy-tray/main.go - Record metrics during recoveryAPI response:
{
"docker_recovery": {
"total_attempts": 42,
"successful": 38,
"failed": 4,
"success_rate": 0.904,
"avg_recovery_time_seconds": 12.5,
"last_recovery": "2025-11-02T10:30:00Z",
"per_server_stats": {
"everything-server": {
"attempts": 5,
"successes": 5,
"avg_time_seconds": 8.2
}
}
}
}
Testing:
Priority: Medium | Effort: 2h
Problem: Users don't know about Docker recovery features and how to configure them.
Solution:
Files to modify:
README.md - Add Docker recovery sectiondocs/troubleshooting.md - Add Docker recovery guidedocs/configuration.md - Document recovery settingsContent to add:
## Docker Recovery
MCPProxy automatically detects when Docker engine becomes unavailable and
implements intelligent recovery:
- **Automatic Detection**: Monitors Docker health every 2-60 seconds
- **Exponential Backoff**: Reduces polling frequency for efficiency
- **Graceful Reconnection**: Reconnects all Docker-based servers
- **Container Cleanup**: Removes orphaned containers on shutdown
### Troubleshooting
If servers don't reconnect after Docker recovery:
1. Check Docker is running: `docker ps`
2. Check mcpproxy logs: `~/.mcpproxy/logs/main.log`
3. Verify container labels: `docker ps -a --filter label=com.mcpproxy.managed`
4. Force reconnect via tray: System Tray → Force Reconnect
### Configuration
Customize recovery behavior in `mcp_config.json`:
...
Testing:
Recommended order (total: 17 hours):
Manual Testing:
Automated Testing:
CI/CD: