foundations/net/docs/HA_STATELESS_CONTAINERS.md
This feature enables Huly Network to support stateless containers with automatic failover capabilities, allowing you to build highly available services that must ensure only one instance is active at any given time.
The Network Server (central coordinator) does NOT support HA:
However, agents and containers DO support HA through stateless container registration:
This document covers HA for agents and containers only.
The stateless container feature allows multiple agents to register pre-existing containers with the same UUID. The network will automatically:
This provides a simple yet effective HA mechanism for services that require leader election or single-instance guarantees.
Unlike regular containers that are created on-demand by agents, stateless containers are:
Agent 1 (Primary) Network Agent 2 (Standby)
| | |
|-- Register UUID-001 ----->| |
|<-- Accepted ---------------| |
| |<--- Register UUID-001 -----|
| |---- Rejected (duplicate) ->|
| | |
| (container terminates) | |
|-- Unregister UUID-001 --->| |
| |-- Event: Removed --------->|
| | |
| |<--- Re-register UUID-001 --|
| |---- Accepted ------------->|
addStatelessContainer(uuid, kind, endpoint, container)Add a stateless container to the agent for registration.
Parameters:
uuid: ContainerUuid - The container UUID (must be consistent across HA agents)kind: ContainerKind - The container type/kindendpoint: ContainerEndpointRef - The endpoint reference for this containercontainer: Container - The actual container instanceExample:
// Note: For production code, prefer using serveAgent() on the client
// This example uses AgentImpl directly for educational purposes
const agent = new AgentImpl('my-agent-id', containerFactories)
// Add a stateless container
agent.addStatelessContainer(
'service-001' as ContainerUuid,
'my-service' as ContainerKind,
'service://host/service-001' as ContainerEndpointRef,
myServiceContainer
)
// Register the agent - network will accept or reject the stateless container
await client.register(agent)
removeStatelessContainer(uuid)Remove a stateless container from tracking (called when rejected by network).
Parameters:
uuid: ContainerUuid - The container UUID to removeThe network's register() method now:
import { AgentImpl, containerUuid, TickManagerImpl } from '@hcengineering/network-core'
import { createNetworkClient, NetworkAgentServer } from '@hcengineering/network-client'
// Shared service UUID across all HA instances
const SHARED_SERVICE_UUID = 'my-ha-service-001' as ContainerUuid
// Note: For production code, prefer using serveAgent() on the client
// This example uses AgentImpl directly for educational purposes
// Create Agent 1 (Primary)
const agent1 = new AgentImpl('agent-1', containerFactories)
agent1.addStatelessContainer(
SHARED_SERVICE_UUID,
'ha-service',
'service://agent-1/service-001',
primaryServiceContainer
)
// Create Agent 2 (Standby)
const agent2 = new AgentImpl('agent-2', containerFactories)
agent2.addStatelessContainer(
SHARED_SERVICE_UUID,
'ha-service',
'service://agent-2/service-001',
standbyServiceContainer
)
// Connect to network
const client = createNetworkClient('localhost:3737')
await client.waitConnection()
// Register both agents
await client.register(agent1) // Will be accepted
await client.register(agent2) // Will be rejected for SHARED_SERVICE_UUID
// When agent1's container is terminated, agent2 will automatically take over
// Listen for container events to monitor failover
client.onUpdate(async (event) => {
for (const containerEvent of event.containers) {
if (containerEvent.container.uuid === SHARED_SERVICE_UUID) {
switch (containerEvent.event) {
case NetworkEventKind.added:
console.log('Container registered:', containerEvent.container.agentId)
break
case NetworkEventKind.removed:
console.log('Container removed - failover in progress')
break
case NetworkEventKind.updated:
console.log('Container updated')
break
}
}
}
})
// Simulate failover by terminating primary container
await agent1.terminate(SHARED_SERVICE_UUID)
// Agent2 will automatically re-register and take over (after ~100ms delay)
AgentImpl.statelessContainers maplist() is calledImplement leader election without external coordination services:
class LeaderService implements Container {
constructor(readonly clusterId: string) {
// Initialize leader-specific resources
}
async request(operation: string, data?: any): Promise<any> {
// Handle leader operations
return { isLeader: true, clusterId: this.clusterId }
}
}
// All nodes register with same UUID, first one becomes leader
const leaderUuid = `cluster-${clusterId}-leader` as ContainerUuid
agent.addStatelessContainer(leaderUuid, 'leader', endpoint, new LeaderService(clusterId))
Ensure only one instance of a service runs across the cluster:
class DatabaseMigrationService implements Container {
async request(operation: string): Promise<any> {
if (operation === 'migrate') {
// Only one instance will run migrations
await this.runMigrations()
}
}
}
const migrationUuid = 'db-migration-singleton' as ContainerUuid
agent.addStatelessContainer(migrationUuid, 'migration', endpoint, migrationService)
Implement active-standby database pattern:
class DatabaseReplica implements Container {
private isActive = false
async onActivation(): Promise<void> {
// Promote standby to active
this.isActive = true
await this.promoteToMaster()
}
async request(operation: string): Promise<any> {
if (!this.isActive) {
throw new Error('Standby replica - read-only')
}
return await this.executeQuery(operation)
}
}
The default failover delay is 100ms. You can adjust this by modifying the timeout in NetworkClientImpl.onEvent():
setTimeout(() => {
this.doRegister(agent).catch(...)
}, 100) // Adjust this value
Orphaned containers are automatically cleaned up based on the timeout settings. Configure via:
const tickManager = new TickManagerImpl(1) // 1 second tick
const network = new NetworkImpl(tickManager)
See the complete example in examples/ha-stateless-container-example.ts:
# Start the network server
cd pods/network-pod
rushx dev
# In another terminal, run the example
npx ts-node examples/ha-stateless-container-example.ts
ping() methodsSolution: Check that:
Solution: This indicates a split-brain scenario. Ensure:
Solution:
NetworkClientImpl.onEvent()