diff --git a/crates/engine/tree/docs/mermaid/engine.mmd b/crates/engine/tree/docs/mermaid/engine.mmd new file mode 100644 index 000000000..404b8dbeb --- /dev/null +++ b/crates/engine/tree/docs/mermaid/engine.mmd @@ -0,0 +1,26 @@ +flowchart TD + subgraph EngineTask[Engine] + Block + -->|Execute transactions sequentially| Execute[Execute transaction] + --> CollectStateUpdates[Collect all accounts and storage slots that were modified] + end + + subgraph TransactionThread[Prewarming thread] + Prewarm[Execute transaction on top of previous block] + --> CollectPrefetchTargets[Collect all accounts and storage slots that were modified] + end + + subgraph StateRootTask[State Root Task thread] + StateRootMessage::PrefetchProofs + StateRootMessage::StateUpdate + StateRootMessage::FinishedStateUpdates + StateRootMessage::RootCalculated + end + + newPayloadRequest[engine_newPayload request] --> Block + Block -->|Start prewarming each transaction in a separate thread| Prewarm + CollectPrefetchTargets --> StateRootMessage::PrefetchProofs + CollectStateUpdates --> StateRootMessage::StateUpdate + Execute -->|All transactions finished executing| StateRootMessage::FinishedStateUpdates + StateRootMessage::RootCalculated + --> newPayloadResponse[engine_newPayload response] diff --git a/crates/engine/tree/docs/mermaid/engine.mmd.png b/crates/engine/tree/docs/mermaid/engine.mmd.png new file mode 100644 index 000000000..bffb16362 Binary files /dev/null and b/crates/engine/tree/docs/mermaid/engine.mmd.png differ diff --git a/crates/engine/tree/docs/mermaid/multiproof-manager.mmd b/crates/engine/tree/docs/mermaid/multiproof-manager.mmd new file mode 100644 index 000000000..3a3652b29 --- /dev/null +++ b/crates/engine/tree/docs/mermaid/multiproof-manager.mmd @@ -0,0 +1,26 @@ +flowchart TD + subgraph MultiProofManager + ParallelProof@{ shape: processes, label: "Start thread with ParallelProof::spawn" } + PendingProofRequests[List of pending proof requests] + + subgraph MultiProofManagerCompletion[on_calculation_complete] + HasPendingProofs{{Has pending multiproof requests?}} + end + + subgraph MultiProofManagerSpawn[spawn_or_queue] + ProofTargetsCondition{{Proof targets not empty?}} + -->|Not empty, MultiProofTargets| MultiProofManagerLimitReached{{Max in-flight proofs limit reached?}} + end + end + + subgraph StateRootTask[StateRootTask] + StateRootMessage::EmptyProof + StateRootMessage::ProofCalculated + end + + MultiProofManagerLimitReached -->|Yes, push to pending requests| PendingProofRequests + MultiProofManagerLimitReached -->|No| ParallelProof + HasPendingProofs <--> PendingProofRequests + HasPendingProofs -->|Yes| ParallelProof + ParallelProof --> StateRootMessage::ProofCalculated + ProofTargetsCondition -->|Empty| StateRootMessage::EmptyProof diff --git a/crates/engine/tree/docs/mermaid/multiproof-manager.mmd.png b/crates/engine/tree/docs/mermaid/multiproof-manager.mmd.png new file mode 100644 index 000000000..f55e533ce Binary files /dev/null and b/crates/engine/tree/docs/mermaid/multiproof-manager.mmd.png differ diff --git a/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd b/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd new file mode 100644 index 000000000..3a1c51158 --- /dev/null +++ b/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd @@ -0,0 +1,21 @@ +flowchart TD + subgraph SparseTrieTask[run_sparse_trie] + SparseTrieUpdate([SparseTrieUpdate channel]) + SparseTrieUpdate --> SparseTrieUpdateAccumulate[Accumulate updates until the channel is empty] + SparseTrieUpdateAccumulate + --> SparseTrieReveal[Reveal multiproof in Sparse Trie] + --> SparseTrieStateUpdate[Update Sparse Trie with new state] + --> SparseTrieStorageRoots[Calculate sparse storage trie roots] + --> SparseTrieUpdateBelowLevel[Calculate sparse trie hashes below certain level] + SparseTrieUpdateBelowLevel --> SparseTrieUpdateClosed{{Is SparseTrieUpdate channel closed?}} + SparseTrieUpdateClosed -->|Yes| SparseTrieRoot[Calculate sparse trie root] + SparseTrieUpdateClosed -->|No| SparseTrieUpdate + end + + subgraph StateRootTask + Incoming[Incoming SparseTrieUpdate messages] + StateRootMessage::RootCalculated + end + + Incoming --> SparseTrieUpdate + SparseTrieRoot --> StateRootMessage::RootCalculated diff --git a/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd.png b/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd.png new file mode 100644 index 000000000..b639e6ad2 Binary files /dev/null and b/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd.png differ diff --git a/crates/engine/tree/docs/mermaid/sparse-trie.mmd b/crates/engine/tree/docs/mermaid/sparse-trie.mmd new file mode 100644 index 000000000..7962d005d --- /dev/null +++ b/crates/engine/tree/docs/mermaid/sparse-trie.mmd @@ -0,0 +1,35 @@ +flowchart TD + classDef revealed stroke:green,stroke-width:4px + + subgraph Reveal2[0x00010 revealed] + R[Root Branch Node
0x] + B1[Branch Node
0x0]:::revealed + E1[Extension Node
0x00]:::revealed + E2[Extension Node
0x1] + B2[Branch Node
0x0001]:::revealed + L1[Leaf Node
0x00010]:::revealed + L2[Leaf Node
0x10010] + H1[Hash
0x01]:::revealed + H2[Hash
0x00011]:::revealed + + R -->|0| B1 + R -->|1| E2 + B1 -->|0| E1 + B1 -->|1| H1 + E1 -->|01| B2 + B2 -->|0| L1 + B2 -->|1| H2 + E2 -->|0010| L2 + end + + subgraph Reveal1[0x10010 revealed] + R1R[Root Branch Node
0x] + R1E2[Extension Node
0x1]:::revealed + R1L2[Leaf Node
0x10010]:::revealed + R1R -->|1| R1E2 + R1E2 -->|0010| R1L2 + end + + subgraph Empty + ER[Root Branch Node
0x] + end diff --git a/crates/engine/tree/docs/mermaid/sparse-trie.mmd.png b/crates/engine/tree/docs/mermaid/sparse-trie.mmd.png new file mode 100644 index 000000000..39a26032c Binary files /dev/null and b/crates/engine/tree/docs/mermaid/sparse-trie.mmd.png differ diff --git a/crates/engine/tree/docs/mermaid/state-root-task.mmd b/crates/engine/tree/docs/mermaid/state-root-task.mmd new file mode 100644 index 000000000..011196d9e --- /dev/null +++ b/crates/engine/tree/docs/mermaid/state-root-task.mmd @@ -0,0 +1,44 @@ +flowchart TD + subgraph StateRootTaskMessages[State Root Task messages] + StateRootMessage::StateUpdate + StateRootMessage::PrefetchProofs + StateRootMessage::EmptyProof + StateRootMessage::ProofCalculated + StataRootMessage::FinishedStateUpdates + end + + subgraph StateRootTask[State Root Task thread] + DeduplicateProofTargets[Deduplicate proof targets according to the list of already fetched proofs] + GenerateProofTargets[Generate proof targets from state update] + --> DeduplicateProofTargets + + NewProof[New proof calculated] + -->|Add new proof| ProofSequencer + --> EndCondition1 + ProofSequencer --> ProofSequencerCondition{{Has sequential proofs?}} + + EndCondition1{{All updates processed?}} + --> EndCondition2{{All pending proofs requested?}} + --> EndCondition3{{All proofs finished processing?}} + end + + subgraph SparseTrieTask[Sparse Trie thread] + SparseTrieUpdate([SparseTrieUpdate channel]) + end + + subgraph MultiProofManager[MultiProofManager] + MultiProofCompletion[on_calculation_complete] + MultiProofSpawn[spawn_or_queue] + end + + StateRootMessage::PrefetchProofs --> DeduplicateProofTargets + StateRootMessage::StateUpdate --> GenerateProofTargets + + DeduplicateProofTargets -----> MultiProofSpawn + + StateRootMessage::EmptyProof --> NewProof + StateRootMessage::ProofCalculated --> NewProof + NewProof ---> MultiProofCompletion + ProofSequencerCondition -->|Yes, send multiproof and state update| SparseTrieUpdate + StataRootMessage::FinishedStateUpdates --> EndCondition1 + EndCondition3 -->|Close SparseTrieUpdate channel| SparseTrieUpdate diff --git a/crates/engine/tree/docs/mermaid/state-root-task.mmd.png b/crates/engine/tree/docs/mermaid/state-root-task.mmd.png new file mode 100644 index 000000000..b0f39608c Binary files /dev/null and b/crates/engine/tree/docs/mermaid/state-root-task.mmd.png differ diff --git a/crates/engine/tree/docs/root.md b/crates/engine/tree/docs/root.md new file mode 100644 index 000000000..d3c4e1e57 --- /dev/null +++ b/crates/engine/tree/docs/root.md @@ -0,0 +1,251 @@ +# State Root Calculation for Engine Payloads + +The heart of Reth is the Engine, which is responsible for driving the chain forward. +Each time it receives a new payload ([engine_newPayloadV4](https://github.com/ethereum/execution-apis/blob/main/src/engine/prague.md#engine_newpayloadv4) +at the time of writing this document), it: +1. Does a bunch of validations. +2. Executes the block contained in the payload. +3. Calculates the [MPT](https://ethereum.org/en/developers/docs/data-structures-and-encoding/patricia-merkle-trie/) +root of the new state. +4. Compares the root with the one received in the block header. +5. Considers the block valid. + +This document describes the lifecycle of a payload with the focus on state root calculation, +from the moment the payload is received, to the moment we have a new state root. + +We will look at the following components: +- [Engine](#engine) +- [State Root Task](#state-root-task) +- [MultiProof Manager](#multiproof-manager) +- [Sparse Trie Task](#sparse-trie-task) + +## Engine + +![Engine](./mermaid/engine.mmd.png) + +It all starts with the `engine_newPayload` request coming from the [Consensus Client](https://ethereum.org/en/developers/docs/nodes-and-clients/#consensus-clients). + +We extract the block from the payload, and eventually pass it to the `EngineApiTreeHandler::insert_block_inner` +method which executes the block and calculates the state root. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2359-L2362 + +Let's walk through the steps involved in the process. + +First, we spawn the [State Root Task](#state-root-task) thread, which will receive the updates from +execution and calculate the state root. https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2449-L2458 + +Then, we do two things with the block: +1. Start prewarming each transaction in a separate thread ("Prewarming thread" on the above diagram). +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2490-L2507 + - Each transaction is optimistically executed in parallel with each other on top of the previous block, + but the results are not committed to the database. + - All accounts and storage slots that were accessed are cached in memory, so that the actual execution + can use them instead of going to the database. + - All modified accounts and storage slots are sent as `StateRootMessage::PrefetchProofs` + to the [State Root Task](#state-root-task). + - Some transactions will fail, because they require the previous transactions to be executed first. + It doesn't matter, because we only care about optimistically prewarming the accounts and storage slots + that are accessed, and transactions will be executed in the correct order later anyway. +2. Execute transactions sequentially. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2523 + - Transactions are executed one after another. Accounts and storage slots accessed during the execution + are looked up in the cache from the previous prewarming step. + - All modified accounts and storage slots are sent as `StateRootMessage::StateUpdate` + to the [State Root Task](#state-root-task). + - When all transactions are executed, the `StateRootMessage::FinishedStateUpdates` is sent + to the [State Root Task](#state-root-task). + +Eventually, the Engine will receive the `StateRootMessage::RootCalculated` message from +the [State Root Task](#state-root-task) thread, and send the `engine_newPayload` response. + +## State Root Task + +![State Root Task](./mermaid/state-root-task.mmd.png) + +State Root Task is a component responsible for receiving the state updates from the [Engine](#engine), +issuing requests for generating proofs to the [MultiProof Manager](#multiproof-manager), +updating the sparse trie using the [Sparse Trie Task](#sparse-trie-task), +and finally sending the state root back to the [Engine](#engine). + +At its core, it's a state machine that receives messages from other components, and handles them accordingly. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L726 + +When the State Root Task is spawned, it also spawns the [Sparse Trie Task](#sparse-trie-task) in a separate thread. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L542-L544 + +### Generating proof targets + +State root calculation in the [Sparse Trie Task](#sparse-trie-task) relies on: +1. Revealing nodes in the trie according to [MPT (Merkle Patricia Trie) proofs](https://docs.chainstack.com/docs/deep-dive-into-merkle-proofs-and-eth-getproof-ethereum-rpc-method). + - Revealing means adding the nodes from the proof to the Sparse Trie structure. + See [example](#revealing-example) for a diagram. +2. Updating the trie according to the state updates received from executing the transactions. + +Let's look at the first two messages on the diagram: `StateRootMessage::StateUpdate` +and `StateRootMessage::PrefetchProofs`. They are sent from the previous [Engine](#engine) step, +and first used to form the proofs targets. + +Proof targets are a list of accounts and storage slots that we send to +the [MultiProof Manager](#multiproof-manager) to generate the MPT proofs. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/trie/common/src/proofs.rs#L20-L21 + +Before sending them, we first deduplicate the list of targets according to a list of proof targets +that were already fetched. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1022-L1028 + +This deduplication step is important, because if two transactions modify the same account or storage slot, +we only need to fetch the MPT proof once. + +Then, the proof targets are passed to the [`MultiProofManager::spawn_or_queue`](#multiproof-manager) method. + +### Sequencing calculated proofs + +When the [MultiProof Manager](#multiproof-manager) finishes calculating the proof, it sends +a message back to the State Root Task. It can be either: +1. `StateRootMessage::EmptyProof` if the deduplication of proof targets resulted in an empty list. +2. `StateRootMessage::ProofCalculated(proof, state)` with the MPT proof calculated for the targets, +along with the state update that the proof was generated for. + +On any message, we call the [`MultiProofManager::on_calculation_complete`](#multiproof-manager) method +to signal that the proof calculation is finished. + +Some proofs can arrive earlier than others, even though they were requested later. It depends on the number +of proof targets, and also some non-determinism in the database caching. + +The issue with this is that we need to ensure that the proofs are sent +to the [Sparse Trie Task](#sparse-trie-task) in the order that they were requested. Because of this, +we introduced a `ProofSequencer` that we add new proofs to. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L666-L672 + +`ProofSequencer` acts in the following way: +1. Each proof has an associated "sequence number" that determines the original order of state updates. +2. When the proof is calculated, it's added to the `ProofSequencer` with the sequence number +and state update associated with it. +3. If the `ProofSequencer` has a consecutive sequence of proofs without gaps in sequence numbers, it returns this sequence. + +Once the `ProofSequencer` returns a sequence of proofs, +we send them along with the state updates to the [Sparse Trie Task](#sparse-trie-task). + +### Finishing the calculation + +Once all transactions are executed, the [Engine](#engine) sends a `StateRootMessage::FinishStateUpdates` message +to the State Root Task, marking the end of receiving state updates. + +Every time we receive a new proof from the [MultiProof Manager](#multiproof-manager), we also check +the following conditions: +1. Are all updates received? (`StateRootMessage::FinishStateUpdates` was sent) +2. Is `ProofSequencer` empty? (no proofs are pending for sequencing) +3. Are all proofs that were sent to the [`MultiProofManager::spawn_or_queue`](#multiproof-manager) finished +calculating and were sent to the [Sparse Trie Task](#sparse-trie-task)? + +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L935-L944 + +When all conditions are met, we close the [State Root Task](#state-root-task) receiver channel, +signaling that no proofs or state updates are coming anymore, and the state root calculation should be finished. + + +## MultiProof Manager + +![MultiProof Manager](./mermaid/multiproof-manager.mmd.png) + +MultiProof manager is a component responsible for generating MPT proofs +and sending them back to the [State Root Task](#state-root-task). + +### Spawning new proof calculations + +The entrypoint is the `spawn_or_queue` method +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L355-L357 + +It has the following responsibilities: +1. On empty proof targets, immediately send `StateRootMessage::EmptyProof` to the [State Root Task](#state-root-task). +2. If the number of maximum concurrent proof calculations is reached, push the proof request to the pending queue. + - Maximum concurrency is determined as `NUM_THREADS / 2 - 2`. + - For a system with 64 threads, the maximum number of concurrent proof calculations will be `64 / 2 - 2 = 30`. +3. If we can spawn a new proof calculation thread, spawn it using [`ParallelProof`](https://github.com/paradigmxyz/reth/blob/09a6aab9f7dc283e42fd00ce8f179542f8558580/crates/trie/parallel/src/proof.rs#L85), +and send `StateRootMessage::ProofCalculated` to the [State Root Task](#state-root-task) once it's done. + +### Exhausting the pending queue + +To exhaust the pending queue from the step 2 of the `spawn_or_queue` described above, +the [State Root Task](#state-root-task) calls into another method `on_calculation_complete` every time +a proof is calculated. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L379-L387 + +Its main purpose is to spawn a new proof calculation thread and do the same as step 3 of the `spawn_or_queue` method +described above. + +## Sparse Trie Task + +Sparse Trie component is the heart of the new state root calculation logic. + +### Sparse Trie primer + +- The state trie of Ethereum is very big (150GB+), and we cannot realistically fit it into memory. +- What if instead of loading the entire trie in memory, +we only load the parts that were modified during the block execution (i.e. make the trie "sparse")? + - Such modified parts will have nodes that will be modified, + and nodes that are needed only for calculating the hashes. + - Essentially, this is the same idea as [MPT proofs](https://docs.chainstack.com/docs/deep-dive-into-merkle-proofs-and-eth-getproof-ethereum-rpc-method) + that have only partial information about the sibling nodes, if these nodes aren't part of the + requested path. +- When updating the trie, we first reveal the nodes using the MPT proofs, and then add/update/remove the leaves, +along with the other nodes that need to be modified in the process of leaf update. + +#### Revealing Example + +![Sparse Trie](./mermaid/sparse-trie.mmd.png) + +1. Empty + - Sparse Trie has no revealed nodes, and an empty root +2. `0x10010` revealed + - Child of the root branch node under the nibble `1` is revealed, and it's an extension node placed on the path `0x1`. + - Child of the extension node at path `0x1` with the extension key `0010` is revealed, and it's a leaf node placed on the path `0x10010`. +3. `0x00010` revealed + - Child of the root branch node under the nibble `0` is revealed, and it's a branch node placed on the path `0x0`. + - Child of the branch node at path `0x0` under the nibble `1` is revealed, and it's a hash node placed on the path `0x01`. + - Child of the branch node at path `0x0` under the nibble `0` is revealed, and it's an extension placed on the path `0x00`. + - Child of the extension node at path `0x00` with the extension key `01` is revealed, and it's a branch node placed on the path `0x0001`. + - Child of the branch node at path `0x0001` under the nibble `1` is revealed, and it's a hash node placed on the path `0x00011`. + - Child of the branch node at path `0x0001` under the nibble `0` is revealed, and it's a leaf node placed on the path `0x00010`. + + +For the implementation details, see [crates/trie/sparse/src/trie.rs](https://github.com/paradigmxyz/reth/blob/09a6aab9f7dc283e42fd00ce8f179542f8558580/crates/trie/sparse/src/trie.rs). + +### Sparse Trie updates + +![Sparse Trie Task](./mermaid/sparse-trie-task.mmd.png) + +The messages to the sparse trie are sent from the [State Root Task](#state-root-task), +and consist of the proof that needs to be revealed, and a list of updates that need to be applied. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L66-L74 + +We do not reveal the proofs and apply the updates immediately, +but instead accumulate them until the messages channel is empty, and then reveal and apply in bulk. +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L991-L994 + +When messages are accumulated, we update the Sparse Trie: +1. Reveal the proof +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1090-L1091 +2. For each modified storage trie, apply updates and calculate the roots in parallel +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1093 +3. Update accounts trie +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1133 +4. Calculate keccak hashes of the nodes below the certain level +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1139 + +As you can see, we do not calculate the state root hash of the accounts trie +(the one that will be the result of the whole task), but instead calculate only the certain hashes. + +This is an optimization that comes from the fact that we will likely update the top 2-3 levels of the trie +in every transaction, so doing that work every time would be wasteful. + +Instead, we calculate hashes for most of the levels of the trie, and do the rest of the work +only when we're finishing the calculation. + +### Finishing the calculation + +Once the messages channel is closed by the [State Root Task](#state-root-task), +we exhaust it, reveal proofs and apply updates, and then calculate the full state root hash +https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1014 + +This state root is eventually sent as `StateRootMessage::RootCalculated` to the [Engine](#engine).