docs/plans/ha-cluster-improvements.md
pro_impl/services/ha/*, pro_impl/services/tasks/redis_task_state_store.go,
and their integration points (cli/cmd/root.go, services/tasks/TaskPool.go,
services/schedules/SchedulePool.go, api/sockets/pool.go).developActive-active HA is enabled by SEMAPHORE_HA_ENABLED. When on, every Semaphore
node shares one Redis instance and coordinates through five components, all
wired in runService() (cli/cmd/root.go:80-185):
| Component | File | Responsibility |
|---|---|---|
RedisTaskStateStore | pro_impl/services/tasks/redis_task_state_store.go | Shared task queue / running / active / alias state + Pub/Sub sync |
RedisNodeRegistry | pro_impl/services/ha/node_registry.go | Heartbeat-based cluster membership |
RedisOrphanCleaner | pro_impl/services/ha/orphan_cleaner.go | Fails tasks whose owning node died |
RedisScheduleDeduplicator | pro_impl/services/ha/schedule_dedup.go | Ensures one node fires each cron occurrence |
RedisWSBroadcaster | pro_impl/services/ha/ws_broadcaster.go | Relays WebSocket events across nodes |
The user asked to find every key the implementation touches. Below is the full set, grouped by owning component.
RedisTaskStateStore (prefix tasks:)
| Key | Type | TTL | Written by | Cleaned by |
|---|---|---|---|---|
tasks:queue | LIST | none | Enqueue / DequeueAt | DequeueAt only |
tasks:running | SET | none | SetRunning | DeleteRunning, orphan cleaner |
tasks:active:<projectID> | SET | none | AddActive | RemoveActive, orphan cleaner |
tasks:owners | HASH taskID→nodeID | HEXPIRE 7d | SetRunning | DeleteRunning, orphan cleaner |
tasks:task_project | HASH taskID→projectID | HEXPIRE 7d | Enqueue / AddActive | DeleteRunning, orphan cleaner |
tasks:aliases | HASH alias→taskID | HEXPIRE 7d | SetAlias | DeleteAlias, orphan cleaner |
tasks:runtime:<taskID> | HASH | EXPIRE 7d | UpdateRuntimeFields | DeleteRunning, orphan cleaner |
tasks:claim:<taskID> | STRING nodeID | EXPIRE 5m | TryClaim | DeleteClaim, orphan cleaner |
tasks:events | Pub/Sub channel | — | every mutating op | — |
RedisNodeRegistry (prefix ha:)
| Key | Type | TTL | Notes |
|---|---|---|---|
ha:node:<nodeID> | STRING "1" | EXPIRE 30s | Per-node heartbeat |
ha:nodes | SET of node IDs | none | Membership; pruned by scan |
RedisScheduleDeduplicator
| Key | Type | TTL | Notes |
|---|---|---|---|
ha:sched:lock:<scheduleID>:<minuteEpoch> | STRING | EXPIRE 55s | One per schedule per minute |
RedisWSBroadcaster
| Key | Type | TTL | Notes |
|---|---|---|---|
ha:ws:broadcast | Pub/Sub channel | — | All cross-node WS traffic |
These are keys that can persist beyond their useful life — i.e. real leaks:
tasks:queue has no TTL and is never reconciled. A queued task that is
later deleted from the DB, or whose ID becomes invalid, stays in the list
forever. The orphan cleaner only scans tasks:running, never the queue.tasks:active:<projectID> and tasks:running have no key-level TTL.
They are emptied member-by-member; if a delete_running / active_remove
path is missed (Pub/Sub loss, crash between DB write and Redis op) the IDs
stay. The orphan cleaner only removes IDs whose owner node is dead — a
stale ID with no owners entry is skipped (orphan_cleaner.go:80-83,
continue on missing owner).HEXPIRE (Redis ≥ 7.4). On Redis < 7.4
HExpire fails and the error is ignored, so tasks:owners,
tasks:task_project, and tasks:aliases grow unbounded forever. There
is no version check and no documented minimum Redis version.ha:nodes set has no TTL and is pruned only by an O(N) scan inside
heartbeat / NodeCount. If every node crashes hard, stale node IDs
linger until some surviving node happens to prune them.tasks:claim:<taskID> orphaned by a node that dies between TryClaim
and SetRunning. Such a task is not in tasks:running, so the orphan
cleaner never sees it; it is unrunnable until the 5-minute claim TTL lapses.byID / byAlias leak. They are only pruned by
dequeue / delete_running / active_remove Pub/Sub events. Pub/Sub is
fire-and-forget — a missed event leaves a TaskRunner cached forever. No
periodic reconciliation against Redis exists.pro_impl/services/tasks/task_state_store_factory.go:9 reads
util.Config.HA.Enabled directly. util.Config.HA is a *HAConfig and is
nil unless configured. The rest of the codebase uses the nil-safe
util.HAEnabled(). This factory will panic on any install that has no ha
block in its config.delete_* events
leak cache entries; missed enqueue events mean a node never learns about a
queued task until a restore. There is no sequence number, no replay, no
reconciliation tick.orphanCheckInterval is 60s and node TTL
is 30s, so a dead node's running task can stay "running" for up to ~90s
before being failed. The cleaner also ignores orphaned queued and
claimed-not-running tasks entirely.time.Now().UTC().Truncate(time.Minute). Two nodes whose clocks differ by a
few seconds across a minute boundary truncate to different minutes and both
fire the same schedule.context.Background().
A hung or failing-over Redis blocks task-pool and WS goroutines
indefinitely; there is no MaxRetries, no dial/read/write timeout, no
circuit breaking.IsNodeAlive returns true on Redis error, so during a Redis
outage the orphan cleaner believes every node is alive (fail-open). Combined
with P5 this means a Redis blip can wedge the cluster.handleQueue does TryClaim →
DequeueAt → runTask as separate Redis ops (TaskPool.go:253-259).
There is no Lua/MULTI wrapping; a crash mid-sequence leaves a claimed task
still in the queue.getOrHydrate calls
LoadRuntimeFields (an HGETALL) on every hit
(redis_task_state_store.go:209-212). QueueRange, RunningRange,
GetActive loop over all tasks → N Redis round-trips per list call, and
these lists are read on every API request and every 5s pool tick.enqueue / set_running
/ active_add / alias_set event triggers updateTaskRunner, which calls
the hydrator → a DB query, on every node. An N-node cluster multiplies task
lifecycle DB reads by N.handleQueue is O(N) Redis calls per event. QueueLen +
QueueGet(i) per index are individual round-trips; the queue is re-walked
on every Pub/Sub event and every 5s tick.NewRedisClient plus a hand-rolled one inside
redis_task_state_store.go). Client-construction code is duplicated and
diverges (the store does not call NewRedisClient).tasks:events and
ha:ws:broadcast are global; every node decodes and filters every event for
every project/user. No sharding.NodeCount() / pruneStaleNodes are O(N) EXISTS calls per
invocation instead of one ranged query.log.Error.tasks: / ha: are
hard-coded; two Semaphore clusters cannot share one Redis except by DB
number. No configurable namespace.TLSSkipVerify is exposed and there is no minimum-Redis-version
validation at startup, so misconfigurations surface only as silent leaks.redis.NewClient only — a
single Redis is itself a SPOF, undermining the "HA" promise.Organised into three phases by risk and value.
util.HAEnabled().pro_impl/services/ha/redis.go exposing
a single process-wide *redis.Client (lazy singleton) with proper
Options: PoolSize, MinIdleConns, MaxRetries, DialTimeout,
Read/WriteTimeout. Delete the duplicated constructor in
redis_task_state_store.go. Inject this client into all five components.context.WithTimeout. Add a small redisCall helper to keep call sites
tidy.Start(), PING Redis and read
INFO server; if redis_version < 7.4, log a loud warning (or refuse, by
config) because hash-field TTLs silently no-op below 7.4.tasks:running /
tasks:queue / tasks:active:* from Redis and drops byID / byAlias
entries no longer present. This makes Pub/Sub a fast-path optimisation
rather than a correctness dependency.time.Now().ha:node:<id> + the
ha:nodes SET with a single ha:nodes ZSET scored by last-heartbeat
epoch. Heartbeat = one ZADD; liveness = score > now - TTL;
NodeCount = ZCOUNT; periodic ZREMRANGEBYSCORE prunes the dead. No
per-node keys, no N+1 EXISTS.tasks:queue and tasks:claim:* in addition to tasks:running; for an
entry whose owner is dead or whose owner record is missing and the
task no longer exists in the DB, remove it everywhere. Drive cleanup off
taskID consistently so empty active:<projectID> sets disappear.EXPIRE-able keys instead of HASH fields,
so nothing can grow unbounded regardless of Redis version.TryClaim + LREM into a single Lua
script so a node either fully owns a task or leaves the queue untouched.orphanCheckInterval to ~20–30s
and make node TTL / heartbeat interval configurable.HGETALL (P8): stop calling
LoadRuntimeFields on every getOrHydrate hit; rely on runtime_update
Pub/Sub events plus the Phase-1 reconcile tick. Add a short TTL'd local
freshness marker if a refresh is still needed.QueueRange / RunningRange /
GetActive should fetch all IDs, then hydrate misses in one pipelined
batch; handleQueue should snapshot the queue once per pass instead of
QueueGet(i) per index.taskRef already has status-relevant fields) so other nodes update
caches without a hydrator/DB call; hydrate lazily only when a node
actually needs to serve that task.redis.NewFailoverClient /
NewClusterClient via config so Redis itself is not a SPOF.ha_nodes_total,
ha_redis_call_duration, ha_redis_errors_total,
ha_orphan_tasks_cleaned_total, ha_pubsub_events_total,
ha_task_cache_size. Surface node list / health in the admin API.SEMAPHORE_HA_KEY_PREFIX so
multiple clusters can share one Redis safely.miniredis for each store/registry method (key shapes,
TTLs, expiry fallback).