src/auth/spec_api.md
This document describes the specification for a secure binary TCP API protocol used exclusively for inter-daemon communication (e.g., between a master and its agents, or between nodes in a cluster). It defines a secure message exchange without relying on TLS, leveraging AES-256-GCM encryption for all communication.
End-user clients communicate with the daemon via the separate and distinct MySQL and HTTP protocols. This specification does not apply to those interfaces.
Instead of JSON, the protocol uses a packed binary format for efficiency. A complete message consists of an unencrypted header followed by an encrypted payload.
1. Unencrypted Header: The encrypted payload is typically preceded by an unencrypted header containing the packet length and command code. This header is handled by the calling network function and is not part of the encrypted data itself.
2. Encrypted Payload Format:
The payload that is processed by the cryptographic functions (EncryptGCM/DecryptGCM) has the following structure. The Username field is included as Associated Authenticated Data (AAD), which means it is authenticated by the GCM tag but not encrypted. This binds the encrypted message to a specific user context.
| Field | Size | Description |
|---|---|---|
| Algorithm Version | 1 byte | A version number for the cryptographic scheme. Currently 1. |
| Username Length | 4 bytes | The length of the Username string that follows. |
| Username | Variable | The sender's username. Used as AAD to ensure the message cannot be re-assigned to another user. |
| Nonce | 12 bytes | The unique, structured nonce for this message. |
| Tag | 16 bytes | The AES-GCM authentication tag. |
| Ciphertext | Variable | The AES-256-GCM encrypted original message payload. |
This entire block is what follows the initial unencrypted length and command code on the wire.
keynoncetagnonce and tag are essentialnonce guarantees cryptographic safety of the encryption (GCM fails catastrophically if reused).tag provides built-in message authentication — ensuring the sender is who they claim to be, and the message hasn't been modified.key, nonce, and tag.nonce and tag.EncryptGCM() / DecryptGCM() ensures integrity and trust in both directions.map<user_id, key_bytes>.This section describes the design and implementation of the 96-bit (12-byte) nonce used for AES-GCM encryption in the daemon-to-daemon binary protocol. The system is designed to be secure, robust against restarts, and efficient, preventing nonce reuse and protecting against replay attacks.
The 12-byte nonce is a structured field with three distinct components, each serving a specific security purpose. The values are packed in network byte order (big-endian).
The 96 bits are divided as follows:
Bits 0-31: Direction and Sender ID (m_uDirAgentId)
DIR_MASK): Direction Bit. Identifies the sender's role. 1 for master-to-agent messages, 0 for agent-to-master messages. This ensures that request and reply nonces occupy separate logical spaces.AGENT_ID_MASK): Sender server_id. Identifies the daemon that created the message. The receiver uses this ID to look up the sender's security state (their boot_id and last seen counter).Bits 32-63: Boot ID (m_uBootId)
boot_id that is almost certainly greater than its previous one, which is critical for the validation logic.Bits 64-95: Counter (m_uCounter)
boot_id and an ever-increasing counter ensures that every nonce generated by a daemon during its lifetime is unique.NonceGenerator_c)The NonceGenerator_c is a thread-safe singleton responsible for creating nonces for all outgoing encrypted messages.
Initialization:
m_uBootId for the process's entire lifetime using the time-based ratchet and random component described above.m_uCounter is initialized to 1.Generation (Generate method):
Generate is called.m_uCounter to get a new, unique counter value for this message.Nonce_t struct, filling it with:
server_id (as the sender ID).m_uBootId.m_uCounter value.Counter Overflow: The code includes a FIXME note regarding counter overflow. If the 32-bit counter is exhausted (after ~4.3 billion messages), a new boot_id should ideally be generated to reset the security context. The current implementation does not handle this, but it is a very rare edge case for a single boot session.
NonceValidator_c)The NonceValidator_c is a thread-safe singleton responsible for validating all incoming nonces to protect against replay attacks and other session-related vulnerabilities.
State Management:
m_hStates) that tracks the security state of every peer (identified by their server_id).AgentState_t struct containing:
m_uLastBootId: The most recent boot_id seen from that peer.m_uLastCounter: The highest counter value seen from that peer within the context of m_uLastBootId.Validation Logic (Validate method):
When a message is received and decrypted, its nonce is passed to Validate, which performs a sequence of checks:
Unpack and Direction Check: The 12-byte nonce is unpacked. The direction bit is checked to ensure it matches the expected flow (e.g., a master expects replies from agents, where IsMaster() is false).
Peer State Lookup: The sender's server_id is extracted from the nonce and used to look up its state. If no state exists, this is the first message from this peer; its state is recorded, and the nonce is accepted as valid.
Boot ID Ratchet Check (Session Integrity):
if new_boot_id > last_seen_boot_id: This indicates the peer has restarted cleanly. The validator accepts this as a new, valid session. It updates its state for the peer with the new boot_id and resets the last_seen_counter to the counter from this message.if new_boot_id < last_seen_boot_id: This is a critical failure. It indicates a delayed message from a previous, stale session or a session replay attack. The message is rejected. This check is effective because boot_ids are time-based and monotonic.Counter Replay Check (Message Integrity):
new_boot_id == last_seen_boot_id, the message is from the current, known session.if new_counter <= last_seen_counter. If this is true, it means this exact message (or an earlier one) has already been seen. It is a replay attack or a severe network reordering, and the message is rejected.State Update: If all checks pass, the nonce is deemed valid. The validator updates the m_uLastCounter for the peer to the new, higher value.
This multi-layered validation ensures that only fresh messages from the most recent session of a peer, in a non-repeating sequence, are accepted.
sha256(token)) may be used as the key for AES-GCM.Error conditions are communicated via status codes within the binary reply packets, not HTTP codes.
Decryption and Authentication Errors: If the receiver fails to decrypt a message for any cryptographic reason—including an incorrect GCM tag (indicating tampering or wrong key), a stale boot_id, or a replayed counter—the packet must be rejected.
GCM authentication failed (bad tag) or replay detected) for security auditing purposes.STATUS_AUTH_ERROR). It does not send back the specific reason for the failure.Malformed Packet: If a received packet has an invalid length or fails basic structural checks before decryption, the connection may be closed, or a reply with a generic error status (e.g., STATUS_ERROR) should be sent.
If future enhancement is needed:
HMAC(shared_secret, label).| Feature | AES-GCM Scheme |
|---|---|
| Confidentiality | ✅ Yes |
| Integrity | ✅ Yes (via tag) |
| Mutual Auth | ✅ (if using unique keys) |
| Replay Protection | ✅ Yes (via mandatory nonce validation) |
| Forward Secrecy | ❌ No (add handshake if needed) |
bool EncryptGCM(const std::vector<uint8_t>& plaintext,
const std::vector<uint8_t>& key,
std::vector<uint8_t>& nonce_out,
std::vector<uint8_t>& ciphertext_out,
std::vector<uint8_t>& tag_out);
bool DecryptGCM(const std::vector<uint8_t>& ciphertext,
const std::vector<uint8_t>& key,
const std::vector<uint8_t>& nonce,
const std::vector<uint8_t>& tag,
std::vector<uint8_t>& plaintext_out);