docs-internal/engine/PEGBOARD_TUNNEL_RETRIES.md
TODO: Clean up this AI slop explanation
This document explains how retries are coordinated between Guard and Pegboard-based handlers when transient tunnel (UPS) issues occur, for both HTTP and WebSocket traffic.
Signal: A retryable transient tunnel failure is signaled by returning an HTTP 503 with the X-RIVET-ERROR header set.
request_timeout), the gateway replies with 503 and X-RIVET-ERROR: pegboard_gateway.tunnel_closed.Guard behavior
status == 503 and the X-RIVET-ERROR header is present.max_attempts, initial_interval), re-resolves the route with ignore_cache = true, and retries the request.Notes for implementers
X-RIVET-ERROR to trigger Guard retries. Use an empty body or minimal payload as appropriate.This section explains how WebSocket retries are coordinated between Guard and Pegboard-based handlers.
guard.websocket_service_unavailable (WebSocketServiceUnavailable).Opening (before accept)
HyperWebsocket (e.g., failing to ups.request(...) to open, or failing to ups.subscribe(...)).HyperWebsocket in the error tuple so Guard still owns it: Err((client_ws, err)).ups.request_timeout) to WebSocketServiceUnavailable.WebSocketServiceUnavailable as retryable.CustomServe: reuse the same client_ws and retry with the new handler.Response: accept client, send a Close with the response message as the reason.Target (non-CustomServe) or mismatch: accept client, send a Close with a generic message (cannot retry).Open (after accept)
Closing
ups.request(...) and to the client via Close frames.Closed
Keep the client socket intact for retries:
WebSocketServiceUnavailable) before awaiting the client websocket.Err((client_ws, err)).Map tunnel-closed errors at the wrapper:
handle_websocket wrapper, detect tunnel-closed (e.g., ups.request_timeout) and map to WebSocketServiceUnavailable.handle_websocket_inner should return raw errors; do not construct WebSocketServiceUnavailable inside the inner function.Use ups.request for all tunnel operations (open, messages, close):
client_ws so Guard can retry.Backoff and attempts:
max_attempts and initial_interval to perform exponential backoff between retries.HyperWebsocket in errors preserves the ability for Guard to re-route and retry without disconnecting the client.WebSocketServiceUnavailable) provides a consistent, guard-specific signal for retryability.