Back to Happy

happy-agent CLI Tool

docs/plans/happy-agent.md

322.0 KB
Original Source

happy-agent CLI Tool

Overview

A new standalone CLI tool (happy-agent) in packages/happy-agent that acts as a dedicated client for controlling Happy Coder agents remotely. Unlike happy-cli which both runs and controls agents, happy-agent only controls them — creating sessions, sending messages, reading history, monitoring state, and stopping sessions.

This is a completely separate client from happy-cli. It has its own authentication flow (account auth via QR code, same as device linking in the mobile app), its own credential storage (~/.happy/agent.key), and is written from scratch with no code sharing.

Context

  • Existing system: Monorepo with happy-cli (agent runtime + control), happy-server (Fastify + PostgreSQL + Redis), happy-app (React Native mobile)
  • Server API: REST endpoints at https://api.cluster-fluster.com + Socket.IO at /v1/updates
  • Authentication: Uses account auth flow (/v1/auth/account/request + /v1/auth/account/response) — generates ephemeral keypair, displays QR code (happy:///account?[base64url-publicKey]), user scans with existing Happy mobile app to approve, receives encrypted account secret
  • Credential storage: ~/.happy/agent.key (separate from happy-cli's ~/.happy/access.key)
  • Encryption: AES-256-GCM (dataKey) for all new sessions. The master content keypair is derived deterministically from the account secret via deriveKey(secret, 'Happy EnCoder', ['content']) → seed → crypto_box_seed_keypair(seed). Per-session random keys are encrypted with the master public key and stored on the server.
  • Session protocol: HTTP POST to create sessions, Socket.IO for real-time messages/state updates
  • Agent state: AgentState.controlledByUser indicates if agent is actively processing; requests field tracks pending tool calls

Development Approach

  • Testing approach: Regular (code first, then tests)
  • Complete each task fully before moving to the next
  • Make small, focused changes
  • CRITICAL: every task MUST include new/updated tests for code changes in that task
  • CRITICAL: all tests must pass before starting next task
  • CRITICAL: update this plan file when scope changes during implementation
  • Run tests after each change

Testing Strategy

  • Unit tests: Required for every task — encryption, key derivation, API client logic, CLI argument parsing, auth flow
  • Integration tests: Test against mock server where feasible

Progress Tracking

  • Mark completed items with [x] immediately when done
  • Add newly discovered tasks with ➕ prefix
  • Document issues/blockers with ⚠️ prefix

Implementation Steps

Task 1: Package scaffolding and build setup

  • Create packages/happy-agent/ directory with package.json (name: happy-agent, type: module, bin: ./bin/happy-agent.mjs)
  • Create tsconfig.json with strict mode, path aliases (@/src/), ESM output
  • Create bin/happy-agent.mjs entry point wrapper (mirrors happy-cli pattern: spawns node with --no-warnings)
  • Create src/index.ts as main entry point with argument parsing shell
  • Add package to root package.json workspaces
  • Add dependencies: axios, socket.io-client, tweetnacl, zod, chalk, commander, qrcode-terminal
  • Add devDependencies: typescript, vitest, pkgroll, tsx
  • Create vitest.config.ts
  • Verify yarn install and yarn build work
  • Write smoke test that imports the package entry point
  • Run tests — must pass before task 2

Task 2: Encryption and key derivation module

  • Create src/encryption.ts with encodeBase64, decodeBase64, encodeBase64Url, getRandomBytes functions
  • Implement hmac_sha512(key, data) using Node.js createHmac('sha512', ...)
  • Implement key derivation tree:
    • deriveSecretKeyTreeRoot(seed, usage) — HMAC-SHA512 with key = usage + ' Master Seed' (UTF-8), data = seed. Split 64-byte result: key = [0:32], chainCode = [32:64]
    • deriveSecretKeyTreeChild(chainCode, index) — HMAC-SHA512 with key = chainCode, data = [0x00, ...UTF-8(index)]. Split same way.
    • deriveKey(master, usage, path) — derives root, then iterates path elements through child derivation
    • deriveContentKeyPair(secret) — calls deriveKey(secret, 'Happy EnCoder', ['content']) → seed → sha512(seed)[0:32]tweetnacl.box.keyPair.fromSecretKey() → returns { publicKey, secretKey }
  • Implement AES-256-GCM encryption:
    • encryptWithDataKey(data, dataKey) — AES-256-GCM: [1-byte version=0][12-byte nonce][ciphertext][16-byte auth tag]
    • decryptWithDataKey(bundle, dataKey) — reverse of above
  • Implement legacy encryption (needed for backward compatibility with existing sessions):
    • encryptLegacy(data, secret) — TweetNaCl secretbox: [24-byte nonce][ciphertext + MAC]
    • decryptLegacy(data, secret) — reverse of above
  • Implement encrypt(key, variant, data) / decrypt(key, variant, data) dispatcher for 'legacy' | 'dataKey' variants
  • Implement libsodiumEncryptForPublicKey(data, recipientPublicKey) — encrypts data with NaCl box using ephemeral keypair. Bundle: [32-byte ephemeral pubkey][24-byte nonce][ciphertext]
  • Implement decryptBoxBundle(bundle, recipientSecretKey) — decrypts NaCl box bundle (used for auth response decryption AND per-session key decryption)
  • Implement authChallenge(secret) — generates signing keypair from secret seed, creates random 32-byte challenge, signs with tweetnacl.sign.detached. Returns { challenge, publicKey, signature } for token refresh via /v1/auth
  • Write tests for key derivation with known test vectors:
    • seed='test seed', usage='test usage', path=['child1','child2']
    • Expected root key: E6E55652456F9FE47D6FF46CA3614E85B499F77E7B340FBBB1553307CEDC1E74
    • Expected final key: 1011C097D2105D27362B987A631496BBF68B836124D1D072E9D1613C6028CF75
  • Write tests for AES-256-GCM encrypt/decrypt round-trip
  • Write tests for legacy encrypt/decrypt round-trip
  • Write tests for base64 encode/decode (standard and URL-safe)
  • Write tests for libsodiumEncryptForPublicKey + decryptBoxBundle round-trip
  • Write tests for authChallenge signature verification with tweetnacl.sign.detached.verify
  • Run tests — must pass before task 3

Task 3: Configuration and credential storage

  • Create src/config.ts — reads HAPPY_SERVER_URL (default: https://api.cluster-fluster.com), HAPPY_HOME_DIR (default: ~/.happy), derives credential file path as ${happyHomeDir}/agent.key
  • Create src/credentials.ts:
    • Credentials type: { token: string, secret: Uint8Array, contentKeyPair: { publicKey: Uint8Array, secretKey: Uint8Array } }
    • readCredentials(config) — parses ~/.happy/agent.key JSON { token, secret }, decodes secret from base64, derives contentKeyPair via deriveContentKeyPair(secret). Returns Credentials or null if file missing.
    • writeCredentials(config, token, secret) — writes { token, secret: base64(secret) } to ~/.happy/agent.key
    • clearCredentials(config) — deletes ~/.happy/agent.key
    • requireCredentials(config) — calls readCredentials, throws with "Run happy-agent auth login first" if null
  • Write tests for credential read/write round-trip (use temp directory)
  • Write tests for contentKeyPair derivation from secret
  • Write tests for missing file returns null
  • Write tests for config defaults and env var overrides
  • Run tests — must pass before task 4

Task 4: Authentication command (happy-agent auth)

  • Create src/auth.ts implementing the account auth flow:
    1. Generate ephemeral box keypair: tweetnacl.box.keyPair.fromSecretKey(randomBytes(32))
    2. POST /v1/auth/account/request with { publicKey: base64(keypair.publicKey) }
    3. Generate QR code data: happy:///account? + base64url(keypair.publicKey)
    4. Display QR code in terminal using qrcode-terminal
    5. Print instructions: "Scan this QR code with the Happy app (Settings → Account → Link New Device)"
    6. Poll /v1/auth/account/request every 1 second with same publicKey
    7. When state === 'authorized': decrypt response using decryptBoxBundle(decodeBase64(response), keypair.secretKey) to get the account secret (32 bytes)
    8. Save token + secret via writeCredentials(config, token, secret)
    9. Print success message
  • Add happy-agent auth login subcommand that runs the flow above
  • Add happy-agent auth logout subcommand that calls clearCredentials()
  • Add happy-agent auth status subcommand that reads credentials and prints auth status (authenticated / not authenticated)
  • Write tests for auth flow with mocked HTTP (polling, success case)
  • Write tests for auth flow error cases (server unreachable, timeout)
  • Write tests for logout (credential deletion)
  • Run tests — must pass before task 5

Task 5: HTTP API client

  • Create src/api.ts with functions:
    • listSessions(config, creds) — GET /v1/sessions, for each session: resolve encryption key (see key resolution below), decrypt metadata/agentState, return decrypted session list
    • listActiveSessions(config, creds) — GET /v2/sessions/active, same decryption logic
    • createSession(config, creds, opts: { tag, metadata }) — POST /v1/sessions:
      • Generate random 32-byte per-session AES key
      • Encrypt it with libsodiumEncryptForPublicKey(sessionKey, creds.contentKeyPair.publicKey) → prepend version byte [0x00] → base64 for dataEncryptionKey field
      • Encrypt metadata with encryptWithDataKey(metadata, sessionKey)
      • Returns decrypted session with the sessionKey attached
    • getSessionMessages(config, creds, sessionId) — GET /v1/sessions/:id/messages
    • deleteSession(config, creds, sessionId) — DELETE /v1/sessions/:id
  • Implement session encryption key resolution for existing sessions:
    • If session has dataEncryptionKey: strip version byte, decryptBoxBundle(encrypted, creds.contentKeyPair.secretKey) → per-session AES key, use 'dataKey' variant
    • If session has no dataEncryptionKey: use creds.secret as key with 'legacy' variant
  • All requests include Authorization: Bearer <token> header
  • All functions handle HTTP errors gracefully (404 → "not found", 401 → "re-authenticate", 5xx → "server error")
  • Write tests with mocked axios for listSessions (success + error)
  • Write tests for session key resolution (dataKey and legacy paths)
  • Write tests with mocked axios for createSession (new + existing tag)
  • Write tests with mocked axios for getSessionMessages
  • Write tests with mocked axios for deleteSession
  • Run tests — must pass before task 6

Task 6: Socket.IO session client

  • Create src/session.tsSessionClient class that:
    • Takes session ID, encryption key, encryption variant, token, server URL
    • Connects to Socket.IO at serverUrl/v1/updates with { token, clientType: 'session-scoped', sessionId }
    • Listens for update events, decrypts messages using session encryption key (AES-256-GCM or legacy depending on variant), emits typed events (message, state-change)
    • Provides sendMessage(text, meta?) — encrypts user message with session key and emits message event with { sid, message }
    • Provides getMetadata() / getAgentState() — returns current cached decrypted state
    • Provides waitForIdle(timeoutMs?) — watches agentState.controlledByUser and agentState.requests, resolves when agent has no pending requests and controlledByUser !== true
    • Provides sendStop() — emits session-end event
    • Provides close() — disconnects socket
  • Write tests for SessionClient message encryption/sending (mock socket.io-client)
  • Write tests for waitForIdle logic (various agentState combinations)
  • Write tests for update event handling and decryption
  • Run tests — must pass before task 7

Task 7: CLI commands — list and status

  • Create src/index.ts using commander with program name happy-agent
  • happy-agent list — calls listSessions, displays table: ID (truncated), name/summary, path, status (active/inactive), last active time. With --json outputs raw JSON. With --active filters to active only.
  • happy-agent status <session-id> — fetches session via list + filter by ID prefix, connects Socket.IO to get live state, displays: session ID, metadata (path, host, lifecycle state), agent state (idle/busy, pending requests count), last message preview. With --json outputs raw JSON. Disconnects after displaying.
  • Create src/output.ts — helper for human-readable vs JSON formatting based on --json flag
  • Write tests for output formatting (human-readable table, JSON mode)
  • Write tests for CLI argument parsing (list, list --active, list --json, status <id>)
  • Run tests — must pass before task 8

Task 8: CLI commands — create and send

  • happy-agent create --tag <tag> [--path <path>] — creates new session with given tag and metadata (path defaults to cwd, host to hostname). Prints session ID. With --json outputs full session JSON.
  • happy-agent send <session-id> <message> — resolves session key, connects Socket.IO, sends user message (encrypted with AES-256-GCM), optionally waits for idle with --wait. Disconnects after. Prints confirmation. With --json outputs message details.
  • Write tests for create command (argument parsing, metadata construction)
  • Write tests for send command (message encryption, --wait flag)
  • Run tests — must pass before task 9

Task 9: CLI commands — history, stop, and wait

  • happy-agent history <session-id> — fetches messages via HTTP, resolves session encryption key (dataKey or legacy), decrypts each message, displays in chronological order with role/timestamp. With --json outputs raw JSON. With --limit <n> limits output.
  • happy-agent stop <session-id> — connects Socket.IO, sends session-end event, disconnects. Prints confirmation.
  • happy-agent wait <session-id> [--timeout <seconds>] — connects Socket.IO, waits for agent idle state (no pending requests, not controlled by user), prints when idle or times out (default 300s). Exit code 0 on idle, 1 on timeout.
  • Write tests for history command (message decryption, chronological ordering, --limit)
  • Write tests for stop command
  • Write tests for wait command (idle detection, timeout handling)
  • Run tests — must pass before task 10

Task 10: Verify acceptance criteria

  • Verify all 8 operations work: auth, create, send, stop, history, wait, status, list
  • Verify --json flag works on all applicable commands
  • Verify error handling: no credentials, server unreachable, invalid session ID
  • Verify interop: session created by happy-agent is visible and controllable from mobile app
  • Verify interop: session created by happy-cli can be listed and history read by happy-agent
  • Run full test suite (unit tests)
  • Run linter — all issues must be fixed

Task 11: [Final] Update documentation

  • Add README.md to packages/happy-agent/ with usage examples for all commands
  • Update root README if it references packages

Technical Details

CLI Commands Summary

happy-agent auth login                          # Authenticate via QR code (scanned by Happy mobile app)
happy-agent auth logout                         # Clear stored credentials
happy-agent auth status                         # Show authentication status

happy-agent list [--active] [--json]            # List all sessions
happy-agent status <session-id> [--json]        # Get live session state
happy-agent create --tag <tag> [--path <path>] [--json]  # Create new session
happy-agent send <session-id> <message> [--wait] [--json]  # Send message
happy-agent history <session-id> [--limit <n>] [--json]    # Read message history
happy-agent stop <session-id>                   # Stop a session
happy-agent wait <session-id> [--timeout <s>]   # Wait for agent to become idle

Authentication Flow (Account Auth)

happy-agent                          Happy Server                    Happy Mobile App
     |                                    |                               |
     +-- Generate ephemeral keypair       |                               |
     +-- POST /v1/auth/account/request -> |                               |
     |   { publicKey }                    |                               |
     |                                    |                               |
     +-- Display QR code in terminal      |                               |
     |   happy:///account?[base64url-key] |                               |
     |                                    |                               |
     |                                    |  <-- User scans QR code ------+
     |                                    |                               |
     |                                    |  <-- POST /v1/auth/account/response
     |                                    |      { publicKey,             |
     |                                    |        response: box.encrypt( |
     |                                    |          accountSecret,       |
     |                                    |          ephemeralPubKey) }   |
     |                                    |                               |
     +-- Poll /v1/auth/account/request -> |                               |
     |   state: 'authorized'              |                               |
     |   token: JWT                       |                               |
     |   response: encrypted secret       |                               |
     |                                    |                               |
     +-- box.open(response, ephemeralSK)  |                               |
     |   -> accountSecret (32 bytes)      |                               |
     +-- Save { token, secret }           |                               |
     |   to ~/.happy/agent.key            |                               |
     |                                    |                               |
     +-- Derive content keypair:          |                               |
     |   deriveKey(secret,                |                               |
     |     'Happy EnCoder', ['content'])  |                               |
     |   -> seed -> box keypair           |                               |
     |   (publicKey for encrypting        |                               |
     |    per-session keys,               |                               |
     |    secretKey for decrypting them)  |                               |
     v Authenticated                      |                               |

Credential File Format (~/.happy/agent.key)

json
{
  "token": "jwt-auth-token",
  "secret": "base64-encoded-32-byte-account-secret"
}

At load time, the content keypair is derived from the secret:

secret (32 bytes)
  -> deriveKey(secret, 'Happy EnCoder', ['content'])
  -> seed (32 bytes)
  -> sha512(seed)[0:32] -> boxSecretKey
  -> tweetnacl.box.keyPair.fromSecretKey(boxSecretKey)
  -> { publicKey (32 bytes), secretKey (32 bytes) }

Key Derivation Tree

HMAC-SHA512 based key tree (matches mobile app implementation):

deriveSecretKeyTreeRoot(seed, usage):
  I = HMAC-SHA512(key = UTF8(usage + ' Master Seed'), data = seed)
  key = I[0:32], chainCode = I[32:64]

deriveSecretKeyTreeChild(chainCode, index):
  data = [0x00, ...UTF8(index)]
  I = HMAC-SHA512(key = chainCode, data = data)
  key = I[0:32], chainCode = I[32:64]

deriveKey(master, usage, path):
  state = deriveSecretKeyTreeRoot(master, usage)
  for each element in path:
    state = deriveSecretKeyTreeChild(state.chainCode, element)
  return state.key

Test vectors:
  seed = UTF8('test seed'), usage = 'test usage', path = ['child1', 'child2']
  Root key:  E6E55652456F9FE47D6FF46CA3614E85B499F77E7B340FBBB1553307CEDC1E74
  Final key: 1011C097D2105D27362B987A631496BBF68B836124D1D072E9D1613C6028CF75

Encryption

For new sessions (created by happy-agent):

  1. Generate random 32-byte per-session key
  2. Encrypt per-session key with master publicKey via libsodiumEncryptForPublicKey → store as dataEncryptionKey on server
  3. Encrypt/decrypt all session data (metadata, messages, agentState) with AES-256-GCM using the per-session key

For existing sessions (created by happy-cli or other clients):

  1. If session has dataEncryptionKey: strip version byte [0], decryptBoxBundle(encrypted, contentKeyPair.secretKey) → per-session AES key, use AES-256-GCM
  2. If session has no dataEncryptionKey: use secret directly as key with legacy TweetNaCl secretbox

AES-256-GCM bundle format: [1-byte version=0][12-byte nonce][ciphertext][16-byte auth tag] Legacy secretbox bundle format: [24-byte nonce][ciphertext + MAC] Box encryption bundle format: [32-byte ephemeral pubkey][24-byte nonce][ciphertext]

Idle Detection Logic

Agent is considered idle when ALL of these are true:

  1. agentState.controlledByUser is not true
  2. agentState.requests is empty or undefined (no pending tool calls)
  3. Session metadata lifecycleState is not 'archived'

Dependencies (minimal)

  • axios — HTTP client
  • socket.io-client — WebSocket communication
  • tweetnacl — Encryption (box for key exchange, secretbox for legacy, sign for auth challenge)
  • zod — Runtime validation
  • chalk — Terminal colors
  • commander — CLI argument parsing
  • qrcode-terminal — QR code display for authentication

Post-Completion

Manual verification:

  • Test full auth flow: run happy-agent auth login, scan QR with Happy app, verify credentials saved
  • Test with real server: create session, send message, verify it appears in mobile app
  • Test wait command with a running agent session
  • Test history command for sessions created by both happy-agent and happy-cli
  • Test cross-client interop: messages from happy-agent readable by mobile app and vice versa

Distribution:

  • Package can be published to npm as happy-agent
  • Alternatively, users install from monorepo via yarn workspace happy-agent build