crates/engine/tree/docs/root.md
The heart of Reth is the Engine, which is responsible for driving the chain forward. Each time it receives a new payload (engine_newPayloadV4 at the time of writing this document), it:
This document describes the lifecycle of a payload with a focus on state root calculation, from the moment the payload is received, to the moment we have a new state root.
We will look at the following components:
It all starts with the engine_newPayload request coming from the Consensus Client.
We extract the block from the payload, and eventually pass it to the EngineApiTreeHandler::insert_block_inner
method that executes the block and calculates the state root.
https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2359-L2362
Let's walk through the steps involved in the process.
First, we spawn the State Root Task thread, which will receive the updates from execution and calculate the state root. https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2449-L2458
Then, we do two things with the block:
StateRootMessage::PrefetchProofs
to the State Root Task.StateRootMessage::StateUpdate
to the State Root Task.StateRootMessage::FinishedStateUpdates is sent
to the State Root Task.Eventually, the Engine will receive the StateRootMessage::RootCalculated message from
the State Root Task thread, and send the engine_newPayload response.
State Root Task is a component responsible for receiving the state updates from the Engine, issuing requests for generating proofs to the MultiProof Manager, updating the sparse trie using the Sparse Trie Task, and finally sending the state root back to the Engine.
At its core, it's a state machine that receives messages from other components, and handles them accordingly. https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L726
When the State Root Task is spawned, it also spawns the Sparse Trie Task in a separate thread. https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L542-L544
State root calculation in the Sparse Trie Task relies on:
Let's look at the first two messages on the diagram: StateRootMessage::StateUpdate
and StateRootMessage::PrefetchProofs. They are sent from the previous Engine step,
and first used to form the proofs targets.
Proof targets are a list of accounts and storage slots that we send to the MultiProof Manager to generate the MPT proofs. https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/trie/common/src/proofs.rs#L20-L21
Before sending them, we first deduplicate the list of targets according to a list of proof targets that were already fetched. https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1022-L1028
This deduplication step is important, because if two transactions modify the same account or storage slot, we only need to fetch the MPT proof once.
Then, the proof targets are passed to the MultiProofManager::spawn_or_queue method.
When the MultiProof Manager finishes calculating the proof, it sends a message back to the State Root Task. It can be either:
StateRootMessage::EmptyProof if the deduplication of proof targets resulted in an empty list.StateRootMessage::ProofCalculated(proof, state) with the MPT proof calculated for the targets,
along with the state update that the proof was generated for.On any message, we call the MultiProofManager::on_calculation_complete method
to signal that the proof calculation is finished.
Some proofs can arrive earlier than others, even though they were requested later. It depends on the number of proof targets, and also some non-determinism in the database caching.
The issue with this is that we need to ensure that the proofs are sent
to the Sparse Trie Task in the order that they were requested. Because of this,
we introduced a ProofSequencer that we add new proofs to.
https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L666-L672
ProofSequencer acts in the following way:
ProofSequencer with the sequence number
and state update associated with it.ProofSequencer has a consecutive sequence of proofs without gaps in sequence numbers, it returns this sequence.Once the ProofSequencer returns a sequence of proofs,
we send them along with the state updates to the Sparse Trie Task.
Once all transactions are executed, the Engine sends a StateRootMessage::FinishedStateUpdates message
to the State Root Task, marking the end of receiving state updates.
Every time we receive a new proof from the MultiProof Manager, we also check the following conditions:
StateRootMessage::FinishedStateUpdates was sent)ProofSequencer empty? (no proofs are pending for sequencing)MultiProofManager::spawn_or_queue finished
calculating and were sent to the Sparse Trie Task?When all conditions are met, we close the State Root Task receiver channel, signaling that no proofs or state updates are coming anymore, and the state root calculation should be finished.
MultiProof manager is a component responsible for generating MPT proofs and sending them back to the State Root Task.
The entrypoint is the spawn_or_queue method
https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L355-L357
It has the following responsibilities:
StateRootMessage::EmptyProof to the State Root Task.NUM_THREADS / 2 - 2.64 / 2 - 2 = 30.ParallelProof,
and send StateRootMessage::ProofCalculated to the State Root Task once it's done.To exhaust the pending queue from step 2 of the spawn_or_queue described above,
the State Root Task calls into another method on_calculation_complete every time
a proof is calculated.
https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L379-L387
Its main purpose is to spawn a new proof calculation thread and do the same as step 3 of the spawn_or_queue method
described above.
Sparse Trie component is the heart of the new state root calculation logic.
0x10010 revealed
1 is revealed, and it's an extension node placed on the path 0x1.0x1 with the extension key 0010 is revealed, and it's a leaf node placed on the path 0x10010.0x00010 revealed
0 is revealed, and it's a branch node placed on the path 0x0.0x0 under the nibble 1 is revealed, and it's a hash node placed on the path 0x01.0x0 under the nibble 0 is revealed, and it's an extension placed on the path 0x00.0x00 with the extension key 01 is revealed, and it's a branch node placed on the path 0x0001.0x0001 under the nibble 1 is revealed, and it's a hash node placed on the path 0x00011.0x0001 under the nibble 0 is revealed, and it's a leaf node placed on the path 0x00010.For the implementation details, see crates/trie/sparse/src/trie.rs.
The messages to the sparse trie are sent from the State Root Task, and consist of the proof that needs to be revealed, and a list of updates that need to be applied. https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L66-L74
We do not reveal the proofs and apply the updates immediately, but instead accumulate them until the messages channel is empty, and then reveal and apply in bulk. https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L991-L994
When messages are accumulated, we update the Sparse Trie:
As you can see, we do not calculate the state root hash of the accounts trie (the one that will be the result of the whole task), but instead calculate only certain hashes.
This is an optimization that comes from the fact that we will likely update the top 2-3 levels of the trie in every transaction, so doing that work every time would be wasteful.
Instead, we calculate hashes for most of the levels of the trie, and do the rest of the work only when we're finishing the calculation.
Once the messages channel is closed by the State Root Task, we exhaust it, reveal proofs and apply updates, and then calculate the full state root hash https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1014
This state root is eventually sent as StateRootMessage::RootCalculated to the Engine.