specifications/network/onchain-discovery.md
The DiemNet On-chain Discovery Protocol is an authenticated discovery protocol for nodes to learn validator and VFN network addresses and network identity public keys. On-chain discovery leverages the Move language and Diem blockchain to serve as a central authenticated data-store for distributing advertised validator and VFN discovery information in the form of RawEncNetworkAddresses for validators and RawNetworkAddresses for VFNs.
ValidatorSet, the chain is also the source of truth for validator and VFN network addresses and network identity public keys.There are four separate discovery problems in Diem:
On-chain discovery serves use cases (1) and (2) but not (3) or (4).
Validator and VFN discovery information are stored in the ValidatorSet in the OnChainConfig.
struct ValidatorSet {
scheme: ConsensusScheme,
payload: Vec<ValidatorInfo>,
}
struct ValidatorInfo {
// The validator's account address. AccountAddresses are initially derived from the account
// auth pubkey; however, the auth key can be rotated, so one should not rely on this
// initial property.
account_address: AccountAddress,
// Voting power of this validator
consensus_voting_power: u64,
// Validator config
config: ValidatorConfig,
}
struct ValidatorConfig {
consensus_public_key: Ed25519PublicKey,
validator_network_addresses: Vec<RawEncNetworkAddress>,
full_node_network_addresses: Vec<RawNetworkAddress>,
}
#[repr(u8)]
enum ConsensusScheme {
Ed25519 = 0,
}
Nodes bootstrap onto the network using the latest known validator set from their latest known chain state in storage (which may be the genesis state) and seed peers from their local configuration if they are too far behind. So long as at least one peer is available that will accept the bootstrapping node's connection then the bootstrapping node will successfully ratchet up to the latest epoch and learn the ValidatorSet for that epoch.
Once nodes are up-to-date, they can receive updates to the on-chain validator set from their own state-sync module.
On-chain discovery supports several different key and address rotation patterns, though the general theme looks like:
ValidatorInfo to the desired state.If a node operator is manually rotating a validator's key or address, then they should manually restart the validator with the new keypair or address configuration in step (3.) after observing the epoch change.
Ideally, routine key rotations are automated and don't require operator intervention. An optional procedure for automated key rotation is outlined below:
Imagine a validator starts with a single advertised network address containing its network identity public key <pubkey1>:
addrs = ["/ip4/1.2.3.4/tcp/6180/ln-noise-ik/<pubkey1>/ln-handshake/0"]
The validator inititates a key rotation to a new network identity public key <pubkey2> by sending a transaction to set its addresses to a new list:
tx: set_validator_network_addresses(["/ip4/1.2.3.4/tcp/6180/ln-noise-ik/<pubkey2>/ln-handshake/0"])
When the transaction commits, the validator observes a reconfiguration with its new advertised network address. It will then begin responding to noise handshakes with the new keypair. Likewise, the node will use the new keypair when dialing out to other peers.
There are, however, some edge cases that require careful consideration. For example, suppose that the validator submits its rotation tx but then crashes for a bit or gets partitioned from the network before observing the reconfiguration. Other nodes then observe the reconfiguration and stop accepting new connections for its old public key. This situation would be problematic, as the validator can no longer connect to any of the other validators (since its old pubkey is no longer trusted). Fortunately, the validator should be able to learn about the most recent reconfiguration by epoch sync'ing from any public-facing VFN endpoints, which do not discriminate connections by public key.
Alternatively, a safer approach to preserve validator connectivity (at the expense of more complexity) might be to rotate in 2 steps. Given the same setup as before, the validator can instead advertise both <pubkey1> and <pubkey2> simultaneously. Only once it has observed this reconfiguration does it rotate to advertising only <pubkey2>.
Modifications to the discovery information requires a quorum. In the event of a connectivity crisis where the validator set loses quorum (e.g. 1/3+ validators crash and forget their identity pubkeys), validators can't submit transactions to modify the on-chain discovery information to regain connectivity. A sufficient fallback in such an extreme event might be for each validator to manually configure their seed peers config with all other validators' discovery information. Alternatively, the Diem Association may issue a new Genesis Transaction to manually set a new validator set, though this requires significant coordination.