workhorse/internal/healthcheck/README.md
This document describes the health check system in GitLab Workhorse that supports readiness checks.
The health check system provides a server that handles readiness checks:
/readiness): Determine if the service is ready to accept trafficThe health check system uses a health_check_listener configuration:
[health_check_listener]
network = "tcp"
addr = "localhost:8182"
readiness_probe_url = "http://localhost:8080/-/readiness"
puma_control_url = "http://localhost:9293"
check_interval = "5s"
timeout = "1s"
graceful_shutdown_delay = "10s"
max_consecutive_failures = 3
min_successful_probes = 1
network: Network type (tcp, tcp4, tcp6, unix)addr: Address to listen on (e.g., localhost:8182)check_interval: How often to perform health checks (default: 2s)timeout: Timeout for individual health check requests (default: 1s)graceful_shutdown_delay: How long to remain unhealthy after shutdown signal (default: 10s)max_consecutive_failures: Number of consecutive failures before marking readiness as unhealthy (default: 3)min_successful_probes: Number of successful probes required for readiness to become healthy (default: 2)readiness_probe_url: URL of Puma's readiness endpoint (default: authBackend + "/-/readiness")puma_control_url: URL of Puma's control server (optional, for readiness checks only)/readiness)Readiness checks determine if the service is ready to accept traffic. They check:
Behavior:
min_successful_probes consecutive successes to become readymax_consecutive_failures threshold before marking as not readyThe health check listener exposes HTTP endpoints that return JSON responses:
/readiness){
"checks": {
"puma_readiness": {
"control_duration_s": 0.00176875,
"control_server": true,
"control_server_last_scrape_time": "2025-10-02T21:55:43Z",
"healthy": true,
"readiness_duration_s": 0.028158458,
"readiness_endpoint": true,
"readiness_last_scrape_time": "2025-10-02T21:55:43Z"
}
},
"health_thresholds": {
"max_consecutive_failures": 3,
"min_successful_probes": 1
},
"metrics": {
"consecutive_failures": 0,
"consecutive_successes": 1
},
"ready": true
}
{
"checks": {
"puma_readiness": {
"control_duration_s": 0,
"control_server": false,
"control_server_last_scrape_time": "2025-10-02T21:56:13Z",
"healthy": false,
"readiness_duration_s": 0,
"readiness_endpoint": false
}
},
"health_thresholds": {
"max_consecutive_failures": 3,
"min_successful_probes": 1
},
"last_error": "puma control server check failed: Get \"http://localhost:9293/stats\": dial tcp [::1]:9293: connect: connection refused",
"metrics": {
"consecutive_failures": 3,
"consecutive_successes": 0
},
"ready": false
}
The health check system exposes several Prometheus metrics:
workhorse_readiness_status: Overall readiness status (1 = ready, 0 = not ready)workhorse_readiness_errors_total: Total number of readiness check errorsworkhorse_health_check_duration_seconds: Duration of health checksworkhorse_readiness_puma_readiness_check: Status of Puma readiness endpointWhen Workhorse receives a SIGTERM signal:
graceful_shutdown_delay to allow load balancers to drain trafficThis approach eliminates 502 errors during deployments and provides smooth traffic transitions.
Implement the HealthChecker interface:
type MyChecker struct {
name string
}
func (c *MyChecker) Name() string {
return c.name
}
func (c *MyChecker) Check(ctx context.Context) CheckResult {
// Perform your health check logic
return CheckResult{
Name: c.name,
Healthy: true,
Details: map[string]interface{}{
"your-field": "ok",
},
}
}
// Add to readiness checks
server.AddReadinessChecker(NewMyChecker("my_readiness_check"))
Health check endpoints not responding
health_check_listener is configured in config.tomlReadiness always unhealthy
puma_readiness_url is accessibleFlapping health status
max_consecutive_failures and min_successful_probestimeout values if checks are timing outcheck_interval frequencyEnable debug logging to see detailed health check information:
gitlab-workhorse -logLevel debug
Test health check endpoints manually:
# Test readiness
curl -v http://localhost:8182/readiness