v3/docs/adr/ADR-111-federation-wg-mesh.md
Federation today (post-alpha.13) assumes peers reach each other through some pre-existing network — Tailscale, a private LAN, or the open internet with wss:// + cert pinning (ADR-107). The federation plugin owns the application protocol (signed envelopes, breaker, audit trail) but treats network connectivity as the integrator's problem.
This works but creates two real frictions:
Operational coupling to Tailscale Inc. Most operators in our session validation used Tailscale as the connectivity layer. Tailscale is excellent but introduces an external trust/billing/availability dependency that's outside ruflo's control. Headscale (self-hosted Tailscale-compatible coord server) reduces the trust dependency but still requires a dedicated control-plane service.
Trust + connectivity managed in two places. Federation's trust ladder (UNTRUSTED → PRIVILEGED) governs MCP-tool access. Tailscale ACLs govern packet-layer reachability. They can drift — a peer that gets EVICTED in federation-trust-land remains in the tailnet until an admin manually removes it. Compromised peer detection in federation does not propagate to the network layer.
Add an optional in-tree WireGuard mesh layer to the federation plugin that:
wg-quick configuration from the federation peer registryAllowedIPs slices (see "Trust-graded access" below)This is an OPT-IN feature (config.wgMesh: true). The existing flat-tailscale-ws path remains the default and unchanged.
Both remain valid choices and the plugin will continue to work over them. ADR-111 adds an additional path — useful when:
ADR-111 is NOT a Tailscale clone. It deliberately omits NAT traversal, DERP relays, MagicDNS, and SSO — those are Tailscale's value-adds and the right answer when you need them is "use Tailscale." ADR-111 provides the minimum control plane to coordinate a WG mesh among federation peers that can already reach each other on UDP.
┌───────────────────────────────────────────────────────────────┐
│ Federation Plugin (extended for ADR-111) │
│ │
│ Existing: │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Discovery service│ │ Breaker service │ │
│ │ (manifests, sig) │ │ (suspend/evict) │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │
│ v v │
│ ┌────────────────────────────────────────┐ │
│ │ NEW: WG Mesh Service │ │
│ │ - generateLocalWgKey() │ │
│ │ - publishWgPubkeyInManifest() │ │
│ │ - buildPeerConfigFromRegistry() │ │
│ │ - applyTrustLevelToAllowedIPs(peer) │ │
│ │ - removePeerOnBreakerSuspend(peer) │ │
│ │ - witnessSignChange(change) │ │
│ └────────────────────┬───────────────────┘ │
└─────────────────────────┼────────────────────────────────────┘
v
┌──────────────────────┐
│ wg-quick / wg setup │ ← OS-level WireGuard
│ /etc/wireguard/wg0 │ (kernel module on linux,
│ │ wireguard-go on macOS)
└──────────────────────┘
│
v UDP/51820
┌──────────────────────────────────────────┐
│ Federation peer mesh (10.50.0.0/16) │
│ • peer A: 10.50.0.1 │
│ • peer B: 10.50.0.2 │
│ • peer C: 10.50.0.3 (SUSPENDED → drop) │
└──────────────────────────────────────────┘
// ALREADY in v1 manifest:
{
nodeId: 'ruvultra',
publicKey: '<ed25519 hex>',
endpoint: 'ws://ruvultra:9100',
capabilities: { agentTypes: [...], ... },
signature: '<ed25519 sig>',
}
// NEW optional ADR-111 section:
+ wg: {
+ publicKey: '<curve25519 base64>', // WG public key
+ endpoint: 'ruvultra.example:51820', // host:port reachable on UDP
+ meshIP: '10.50.0.2/32', // assigned mesh IP
+ }
The Ed25519 manifest signature covers the new wg block — peers verifying the manifest also verify the WG key binding.
AllowedIPsFederation already has TrustLevel (5 levels) + CAPABILITY_GATES (per-level allowed ops). ADR-111 extends this with WG_NETWORK_GATES:
export const WG_NETWORK_GATES: Record<TrustLevel, WgNetworkRule[]> = {
[TrustLevel.UNTRUSTED]: [
// Drop everything — peer is in registry but not in mesh
],
[TrustLevel.VERIFIED]: [
{ proto: 'tcp', port: 9100 }, // discovery only
],
[TrustLevel.ATTESTED]: [
{ proto: 'tcp', port: 9100 },
{ proto: 'tcp', portRange: [9101, 9199] }, // federation messaging
],
[TrustLevel.TRUSTED]: [
{ proto: 'tcp', port: 9100 },
{ proto: 'tcp', portRange: [9101, 9199] },
{ proto: 'tcp', port: 22 }, // ssh (operator)
{ proto: 'tcp', portRange: [80, 443] }, // services
],
[TrustLevel.PRIVILEGED]: [
{ proto: 'all' }, // full network
],
};
Implementation note: WG itself doesn't natively port-filter beyond AllowedIPs (which is L3 routing, not L4 ACL). To enforce port-level rules we either:
nftables on linux, pf on macOS) keyed off the WG interface — most flexibleADR-111 v1 ships (b) for portability; (a) is a Phase 4 add for high-security deployments (see Implementation plan below).
The existing ADR-097 Phase 2.b breaker fires node.suspend() / node.evict(). ADR-111 hooks the state-machine transitions:
// In federation-coordinator.ts:
peer.on('stateChange', (newState) => {
if (newState === SUSPENDED || newState === EVICTED) {
wgMesh.removeAllowedIPs(peer); // peer immediately can't reach anyone
}
if (newState === ACTIVE && previousState === SUSPENDED) {
wgMesh.restoreAllowedIPs(peer); // breaker reactivate restores mesh
}
});
Removed peers stay in the WG configuration (key remains) but with AllowedIPs empty — equivalent to a soft-block. EVICTED peers get the entire [Peer] section removed and key revoked.
Every WG mesh change becomes a witness manifest entry, signed by the operator's Ed25519 key:
{
"id": "wg-mesh-change-2026-05-09T22:00:00Z-add-peer-ruvultra",
"desc": "Added ruvultra to WG mesh, AllowedIPs 10.50.0.2/32, TrustLevel=ATTESTED",
"file": ".claude-flow/federation/wg-changes.log",
"marker": "PublicKey = <wg-pk-base64>",
"ts": "2026-05-09T22:00:00Z",
"operator": "<ed25519 sig of the change>"
}
Anyone running node plugins/ruflo-core/scripts/witness/verify.mjs --manifest .claude-flow/federation/wg-witness.md.json can prove the mesh's history end-to-end. This is something Tailscale fundamentally can't offer because their coordination is server-mediated.
FederationManifest type with optional wg: { publicKey, endpoint, meshIP }config.wgMesh === true, generate a WG keypair and persist to .claude-flow/federation/wg-key-<nodeId>.json (mode 0600, alongside existing Ed25519 key)deriveMeshIP(nodeId) resolves to an IP already published by another peer's manifest, the WgMeshService rotates one bit of the hash input (nodeId + '\x00', nodeId + '\x01', …) until a free slot is found. Larger deployments should jump to 10.50.0.0/12 (~1M slots).WgMeshService + config generation (3-4 days)domain/services/wg-mesh-service.tsbuildPeerConfigFromRegistry() → builds wg-quick-compatible config from discovery.listPeers() filtering ATTESTED+/etc/wireguard/ruflo-fed.conf (linux) or equivalent path on macOSwg-quick up ruflo-fed invocation (with operator confirmation per CLAUDE.md "destructive actions" guidance — bringing up a network interface qualifies)onPeerDiscovered to regenerate config on new peerswg set ruflo-fed peer <pubkey> remove-allowed-ipswg set ruflo-fed peer <pubkey> removeWG_NETWORK_GATES tablenftables rules (linux) — Phase 4apf rules — Phase 4b.claude-flow/federation/wg-changes.logfederation_wg_status MCP tool exposes the chainfederation_wg_status — peer mesh state with trust + AllowedIPsfederation_wg_attest — operator-signs a coordination changefederation_wg_keyrotate — rotate the local WG key + republish manifestTotal estimated effort: ~14-18 days for one engineer for v1 (Phases 1-7), ~30 days with platform-specific firewall hardening (Phase 4 a+b done thoroughly).
| Threat | Mitigation |
|---|---|
| Compromised federation peer with valid WG key | Breaker auto-removes from mesh on SUSPEND/EVICT (vs Tailscale: stays in tailnet until manual admin action) |
| Operator compromises adding rogue peers silently | Every change witness-signed; chain verifiable by anyone (vs Tailscale: trust the admin panel logs) |
| Drift between federation trust + network access | They're the same data — no drift possible |
| Tailscale Inc. compromise / outage | Zero dependency |
| Threat | Why not / what to use instead |
|---|---|
| Peers behind NAT without UDP punching | No DERP relay. Use Tailscale OR a manually-configured WG relay (operator concern) |
| Eavesdropping on UDP/51820 | WG provides this — same crypto Tailscale uses |
| Malicious operator pushing bad witness entries | Witness chain is append-only; can't hide a bad entry. But CAN add bad entries if you control the operator key. Mitigation: multi-sig witness in Phase 8+ |
| Side-channel info leak via traffic timing | WG doesn't pad — same as Tailscale. Out of scope |
Local WG key rotation:
nodeId as identity; mesh IPs are derived. No DNS layer needed.agentic-flow/transport/loader WS path. ADR-111 is OPT-IN; the existing path remains default + tested.Q: Do you need NAT traversal between peers behind tricky NATs?
YES → Tailscale or Headscale
NO → continue
Q: Do you have ≤50 federation peers that can reach each other on UDP?
NO → Tailscale (their NAT traversal handles your scale)
YES → continue
Q: Do you want federation trust changes to immediately affect packet-layer
reachability (no two-system drift)?
YES → ADR-111
NO → existing tailnet+ADR-097 setup is simpler
Q: Do you need cryptographic provenance of all coordination changes
(no central party trust required)?
YES → ADR-111
NO → Tailscale's audit log suffices
| Phase | Status |
|---|---|
| 1 — Manifest extension + key generation | Implemented (2026-05-10) |
| 2 — WgMeshService + config generation | Implemented (2026-05-10) |
| 3 — Breaker integration | Implemented (2026-05-10) |
| 4 — Trust-graded firewall rules | Proposed |
| 5 — Witness attestation | Proposed |
| 6 — Operator MCP tools | Proposed |
| 7 — Cross-OS validation | Proposed |
Re-open this ADR when: