docs(tree): state root task (#14400)

Co-authored-by: Dan Cline <6798349+Rjected@users.noreply.github.com>
2025-12-06 10:59:55 +00:00 · 2025-02-12 12:28:35 +00:00
parent 11eac03f00
commit ac7b5959fb
11 changed files with 403 additions and 0 deletions
--- a/crates/engine/tree/docs/mermaid/engine.mmd
+++ b/crates/engine/tree/docs/mermaid/engine.mmd
@ -0,0 +1,26 @@
+flowchart TD
+    subgraph EngineTask[Engine]
+        Block
+            -->|Execute transactions sequentially| Execute[Execute transaction]
+            --> CollectStateUpdates[Collect all accounts and storage slots that were modified]
+    end
+
+    subgraph TransactionThread[Prewarming thread]
+        Prewarm[Execute transaction on top of previous block]
+            --> CollectPrefetchTargets[Collect all accounts and storage slots that were modified]
+    end
+
+    subgraph StateRootTask[State Root Task thread]
+        StateRootMessage::PrefetchProofs
+        StateRootMessage::StateUpdate
+        StateRootMessage::FinishedStateUpdates
+        StateRootMessage::RootCalculated
+    end
+
+    newPayloadRequest[engine_newPayload request] --> Block
+    Block -->|Start prewarming each transaction in a separate thread| Prewarm
+    CollectPrefetchTargets --> StateRootMessage::PrefetchProofs
+    CollectStateUpdates --> StateRootMessage::StateUpdate
+    Execute -->|All transactions finished executing| StateRootMessage::FinishedStateUpdates
+    StateRootMessage::RootCalculated
+        --> newPayloadResponse[engine_newPayload response]
--- a/crates/engine/tree/docs/mermaid/engine.mmd.png
+++ b/crates/engine/tree/docs/mermaid/engine.mmd.png
--- a/crates/engine/tree/docs/mermaid/multiproof-manager.mmd
+++ b/crates/engine/tree/docs/mermaid/multiproof-manager.mmd
@ -0,0 +1,26 @@
+flowchart TD
+    subgraph MultiProofManager
+        ParallelProof@{ shape: processes, label: "Start thread with ParallelProof::spawn" }
+        PendingProofRequests[List of pending proof requests]
+
+        subgraph MultiProofManagerCompletion[on_calculation_complete]
+            HasPendingProofs{{Has pending multiproof requests?}}
+        end
+
+        subgraph MultiProofManagerSpawn[spawn_or_queue]
+            ProofTargetsCondition{{Proof targets not empty?}}
+                -->|Not empty, MultiProofTargets| MultiProofManagerLimitReached{{Max in-flight proofs limit reached?}}
+        end
+    end
+
+    subgraph StateRootTask[StateRootTask]
+        StateRootMessage::EmptyProof
+        StateRootMessage::ProofCalculated
+    end
+
+    MultiProofManagerLimitReached -->|Yes, push to pending requests| PendingProofRequests
+    MultiProofManagerLimitReached -->|No| ParallelProof
+    HasPendingProofs <--> PendingProofRequests
+    HasPendingProofs -->|Yes| ParallelProof
+    ParallelProof --> StateRootMessage::ProofCalculated
+    ProofTargetsCondition -->|Empty| StateRootMessage::EmptyProof
--- a/crates/engine/tree/docs/mermaid/multiproof-manager.mmd.png
+++ b/crates/engine/tree/docs/mermaid/multiproof-manager.mmd.png
--- a/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd
+++ b/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd
@ -0,0 +1,21 @@
+flowchart TD
+    subgraph SparseTrieTask[run_sparse_trie]
+        SparseTrieUpdate([SparseTrieUpdate channel])
+        SparseTrieUpdate --> SparseTrieUpdateAccumulate[Accumulate updates until the channel is empty]
+        SparseTrieUpdateAccumulate
+            --> SparseTrieReveal[Reveal multiproof in Sparse Trie]
+            --> SparseTrieStateUpdate[Update Sparse Trie with new state]
+            --> SparseTrieStorageRoots[Calculate sparse storage trie roots]
+            --> SparseTrieUpdateBelowLevel[Calculate sparse trie hashes below certain level]
+        SparseTrieUpdateBelowLevel --> SparseTrieUpdateClosed{{Is SparseTrieUpdate channel closed?}}
+        SparseTrieUpdateClosed -->|Yes| SparseTrieRoot[Calculate sparse trie root]
+        SparseTrieUpdateClosed -->|No| SparseTrieUpdate
+    end
+
+    subgraph StateRootTask
+        Incoming[Incoming SparseTrieUpdate messages]
+        StateRootMessage::RootCalculated
+    end
+
+    Incoming --> SparseTrieUpdate
+    SparseTrieRoot --> StateRootMessage::RootCalculated
--- a/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd.png
+++ b/crates/engine/tree/docs/mermaid/sparse-trie-task.mmd.png
--- a/crates/engine/tree/docs/mermaid/sparse-trie.mmd
+++ b/crates/engine/tree/docs/mermaid/sparse-trie.mmd
@ -0,0 +1,35 @@
+flowchart TD
+    classDef revealed stroke:green,stroke-width:4px
+
+    subgraph Reveal2[0x00010 revealed]
+        R[Root Branch Node<br/>0x]
+        B1[Branch Node<br/>0x0]:::revealed
+        E1[Extension Node<br/>0x00]:::revealed
+        E2[Extension Node<br/>0x1]
+        B2[Branch Node<br/>0x0001]:::revealed
+        L1[Leaf Node<br/>0x00010]:::revealed
+        L2[Leaf Node<br/>0x10010]
+        H1[Hash<br/>0x01]:::revealed
+        H2[Hash<br/>0x00011]:::revealed
+
+        R -->|0| B1
+        R -->|1| E2
+        B1 -->|0| E1
+        B1 -->|1| H1
+        E1 -->|01| B2
+        B2 -->|0| L1
+        B2 -->|1| H2
+        E2 -->|0010| L2
+    end
+
+    subgraph Reveal1[0x10010 revealed]
+        R1R[Root Branch Node<br/>0x]
+        R1E2[Extension Node<br/>0x1]:::revealed
+        R1L2[Leaf Node<br/>0x10010]:::revealed
+        R1R -->|1| R1E2
+        R1E2 -->|0010| R1L2
+    end
+
+    subgraph Empty
+        ER[Root Branch Node<br/>0x]
+    end
--- a/crates/engine/tree/docs/mermaid/sparse-trie.mmd.png
+++ b/crates/engine/tree/docs/mermaid/sparse-trie.mmd.png
--- a/crates/engine/tree/docs/mermaid/state-root-task.mmd
+++ b/crates/engine/tree/docs/mermaid/state-root-task.mmd
@ -0,0 +1,44 @@
+flowchart TD
+    subgraph StateRootTaskMessages[State Root Task messages]
+        StateRootMessage::StateUpdate
+        StateRootMessage::PrefetchProofs
+        StateRootMessage::EmptyProof
+        StateRootMessage::ProofCalculated
+        StataRootMessage::FinishedStateUpdates
+    end
+
+    subgraph StateRootTask[State Root Task thread]
+        DeduplicateProofTargets[Deduplicate proof targets according to the list of already fetched proofs]
+        GenerateProofTargets[Generate proof targets from state update]
+            --> DeduplicateProofTargets
+
+        NewProof[New proof calculated]
+            -->|Add new proof| ProofSequencer
+            --> EndCondition1
+        ProofSequencer --> ProofSequencerCondition{{Has sequential proofs?}}
+
+        EndCondition1{{All updates processed?}}
+            --> EndCondition2{{All pending proofs requested?}}
+            --> EndCondition3{{All proofs finished processing?}}
+    end
+
+    subgraph SparseTrieTask[Sparse Trie thread]
+        SparseTrieUpdate([SparseTrieUpdate channel])
+    end
+
+    subgraph MultiProofManager[MultiProofManager]
+        MultiProofCompletion[on_calculation_complete]
+        MultiProofSpawn[spawn_or_queue]
+    end
+
+    StateRootMessage::PrefetchProofs --> DeduplicateProofTargets
+    StateRootMessage::StateUpdate --> GenerateProofTargets
+
+    DeduplicateProofTargets -----> MultiProofSpawn
+
+    StateRootMessage::EmptyProof --> NewProof
+    StateRootMessage::ProofCalculated --> NewProof
+    NewProof ---> MultiProofCompletion
+    ProofSequencerCondition -->|Yes, send multiproof and state update| SparseTrieUpdate
+    StataRootMessage::FinishedStateUpdates --> EndCondition1
+    EndCondition3 -->|Close SparseTrieUpdate channel| SparseTrieUpdate
--- a/crates/engine/tree/docs/mermaid/state-root-task.mmd.png
+++ b/crates/engine/tree/docs/mermaid/state-root-task.mmd.png
--- a/crates/engine/tree/docs/root.md
+++ b/crates/engine/tree/docs/root.md
@ -0,0 +1,251 @@
+# State Root Calculation for Engine Payloads
+
+The heart of Reth is the Engine, which is responsible for driving the chain forward.
+Each time it receives a new payload ([engine_newPayloadV4](https://github.com/ethereum/execution-apis/blob/main/src/engine/prague.md#engine_newpayloadv4)
+at the time of writing this document), it:
+1. Does a bunch of validations.
+2. Executes the block contained in the payload.
+3. Calculates the [MPT](https://ethereum.org/en/developers/docs/data-structures-and-encoding/patricia-merkle-trie/)
+root of the new state.
+4. Compares the root with the one received in the block header.
+5. Considers the block valid.
+
+This document describes the lifecycle of a payload with the focus on state root calculation,
+from the moment the payload is received, to the moment we have a new state root.
+
+We will look at the following components:
+- [Engine](#engine)
+- [State Root Task](#state-root-task)
+- [MultiProof Manager](#multiproof-manager)
+- [Sparse Trie Task](#sparse-trie-task)
+
+## Engine
+
+![Engine](./mermaid/engine.mmd.png)
+
+It all starts with the `engine_newPayload` request coming from the [Consensus Client](https://ethereum.org/en/developers/docs/nodes-and-clients/#consensus-clients).
+
+We extract the block from the payload, and eventually pass it to the `EngineApiTreeHandler::insert_block_inner`
+method which executes the block and calculates the state root. 
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2359-L2362
+
+Let's walk through the steps involved in the process.
+
+First, we spawn the [State Root Task](#state-root-task) thread, which will receive the updates from
+execution and calculate the state root. https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2449-L2458
+
+Then, we do two things with the block:
+1. Start prewarming each transaction in a separate thread ("Prewarming thread" on the above diagram).
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2490-L2507
+    - Each transaction is optimistically executed in parallel with each other on top of the previous block,
+    but the results are not committed to the database.
+    - All accounts and storage slots that were accessed are cached in memory, so that the actual execution
+    can use them instead of going to the database.
+    - All modified accounts and storage slots are sent as `StateRootMessage::PrefetchProofs`
+    to the [State Root Task](#state-root-task).
+    - Some transactions will fail, because they require the previous transactions to be executed first.
+    It doesn't matter, because we only care about optimistically prewarming the accounts and storage slots
+    that are accessed, and transactions will be executed in the correct order later anyway.
+2. Execute transactions sequentially.
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/mod.rs#L2523
+    - Transactions are executed one after another. Accounts and storage slots accessed during the execution
+    are looked up in the cache from the previous prewarming step. 
+    - All modified accounts and storage slots are sent as `StateRootMessage::StateUpdate`
+    to the [State Root Task](#state-root-task).
+    - When all transactions are executed, the `StateRootMessage::FinishedStateUpdates` is sent
+    to the [State Root Task](#state-root-task).
+
+Eventually, the Engine will receive the `StateRootMessage::RootCalculated` message from
+the [State Root Task](#state-root-task) thread, and send the `engine_newPayload` response.
+
+## State Root Task
+
+![State Root Task](./mermaid/state-root-task.mmd.png)
+
+State Root Task is a component responsible for receiving the state updates from the [Engine](#engine),
+issuing requests for generating proofs to the [MultiProof Manager](#multiproof-manager),
+updating the sparse trie using the [Sparse Trie Task](#sparse-trie-task),
+and finally sending the state root back to the [Engine](#engine).
+
+At its core, it's a state machine that receives messages from other components, and handles them accordingly.
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L726
+
+When the State Root Task is spawned, it also spawns the [Sparse Trie Task](#sparse-trie-task) in a separate thread.
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L542-L544
+
+### Generating proof targets
+
+State root calculation in the [Sparse Trie Task](#sparse-trie-task) relies on:
+1. Revealing nodes in the trie according to [MPT (Merkle Patricia Trie) proofs](https://docs.chainstack.com/docs/deep-dive-into-merkle-proofs-and-eth-getproof-ethereum-rpc-method).
+    - Revealing means adding the nodes from the proof to the Sparse Trie structure.
+    See [example](#revealing-example) for a diagram.
+2. Updating the trie according to the state updates received from executing the transactions.
+
+Let's look at the first two messages on the diagram: `StateRootMessage::StateUpdate`
+and `StateRootMessage::PrefetchProofs`. They are sent from the previous [Engine](#engine) step,
+and first used to form the proofs targets.
+
+Proof targets are a list of accounts and storage slots that we send to
+the [MultiProof Manager](#multiproof-manager) to generate the MPT proofs.
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/trie/common/src/proofs.rs#L20-L21
+
+Before sending them, we first deduplicate the list of targets according to a list of proof targets
+that were already fetched.
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1022-L1028 
+
+This deduplication step is important, because if two transactions modify the same account or storage slot,
+we only need to fetch the MPT proof once.
+
+Then, the proof targets are passed to the [`MultiProofManager::spawn_or_queue`](#multiproof-manager) method.
+
+### Sequencing calculated proofs
+
+When the [MultiProof Manager](#multiproof-manager) finishes calculating the proof, it sends
+a message back to the State Root Task. It can be either:
+1. `StateRootMessage::EmptyProof` if the deduplication of proof targets resulted in an empty list.
+2. `StateRootMessage::ProofCalculated(proof, state)` with the MPT proof calculated for the targets,
+along with the state update that the proof was generated for.
+
+On any message, we call the [`MultiProofManager::on_calculation_complete`](#multiproof-manager) method
+to signal that the proof calculation is finished.
+
+Some proofs can arrive earlier than others, even though they were requested later. It depends on the number
+of proof targets, and also some non-determinism in the database caching.
+
+The issue with this is that we need to ensure that the proofs are sent
+to the [Sparse Trie Task](#sparse-trie-task) in the order that they were requested. Because of this,
+we introduced a `ProofSequencer` that we add new proofs to.
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L666-L672
+
+`ProofSequencer` acts in the following way:
+1. Each proof has an associated "sequence number" that determines the original order of state updates.
+2. When the proof is calculated, it's added to the `ProofSequencer` with the sequence number
+and state update associated with it.
+3. If the `ProofSequencer` has a consecutive sequence of proofs without gaps in sequence numbers, it returns this sequence.
+
+Once the `ProofSequencer` returns a sequence of proofs,
+we send them along with the state updates to the [Sparse Trie Task](#sparse-trie-task).
+
+### Finishing the calculation
+
+Once all transactions are executed, the [Engine](#engine) sends a `StateRootMessage::FinishStateUpdates` message
+to the State Root Task, marking the end of receiving state updates.
+
+Every time we receive a new proof from the [MultiProof Manager](#multiproof-manager), we also check
+the following conditions:
+1. Are all updates received? (`StateRootMessage::FinishStateUpdates` was sent)
+2. Is `ProofSequencer` empty? (no proofs are pending for sequencing)
+3. Are all proofs that were sent to the [`MultiProofManager::spawn_or_queue`](#multiproof-manager) finished
+calculating and were sent to the [Sparse Trie Task](#sparse-trie-task)?
+
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L935-L944
+
+When all conditions are met, we close the [State Root Task](#state-root-task) receiver channel,
+signaling that no proofs or state updates are coming anymore, and the state root calculation should be finished.
+
+
+## MultiProof Manager
+
+![MultiProof Manager](./mermaid/multiproof-manager.mmd.png)
+
+MultiProof manager is a component responsible for generating MPT proofs
+and sending them back to the [State Root Task](#state-root-task).
+
+### Spawning new proof calculations
+
+The entrypoint is the `spawn_or_queue` method
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L355-L357
+
+It has the following responsibilities:
+1. On empty proof targets, immediately send `StateRootMessage::EmptyProof` to the [State Root Task](#state-root-task).
+2. If the number of maximum concurrent proof calculations is reached, push the proof request to the pending queue.
+    - Maximum concurrency is determined as `NUM_THREADS / 2 - 2`.
+    - For a system with 64 threads, the maximum number of concurrent proof calculations will be `64 / 2 - 2 = 30`.
+3. If we can spawn a new proof calculation thread, spawn it using [`ParallelProof`](https://github.com/paradigmxyz/reth/blob/09a6aab9f7dc283e42fd00ce8f179542f8558580/crates/trie/parallel/src/proof.rs#L85),
+and send `StateRootMessage::ProofCalculated` to the [State Root Task](#state-root-task) once it's done.
+
+### Exhausting the pending queue
+
+To exhaust the pending queue from the step 2 of the `spawn_or_queue` described above,
+the [State Root Task](#state-root-task) calls into another method `on_calculation_complete` every time
+a proof is calculated.
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L379-L387
+
+Its main purpose is to spawn a new proof calculation thread and do the same as step 3 of the `spawn_or_queue` method
+described above.
+
+## Sparse Trie Task
+
+Sparse Trie component is the heart of the new state root calculation logic.
+
+### Sparse Trie primer
+
+- The state trie of Ethereum is very big (150GB+), and we cannot realistically fit it into memory.
+- What if instead of loading the entire trie in memory,
+we only load the parts that were modified during the block execution (i.e. make the trie "sparse")?
+    - Such modified parts will have nodes that will be modified,
+    and nodes that are needed only for calculating the hashes.
+    - Essentially, this is the same idea as [MPT proofs](https://docs.chainstack.com/docs/deep-dive-into-merkle-proofs-and-eth-getproof-ethereum-rpc-method)
+    that have only partial information about the sibling nodes, if these nodes aren't part of the
+    requested path.
+- When updating the trie, we first reveal the nodes using the MPT proofs, and then add/update/remove the leaves,
+along with the other nodes that need to be modified in the process of leaf update.
+
+#### Revealing Example
+
+![Sparse Trie](./mermaid/sparse-trie.mmd.png)
+
+1. Empty
+    - Sparse Trie has no revealed nodes, and an empty root
+2. `0x10010` revealed
+    - Child of the root branch node under the nibble `1` is revealed, and it's an extension node placed on the path `0x1`.
+    - Child of the extension node at path `0x1` with the extension key `0010` is revealed, and it's a leaf node placed on the path `0x10010`.
+3. `0x00010` revealed
+    - Child of the root branch node under the nibble `0` is revealed, and it's a branch node placed on the path `0x0`.
+    - Child of the branch node at path `0x0` under the nibble `1` is revealed, and it's a hash node placed on the path `0x01`.
+    - Child of the branch node at path `0x0` under the nibble `0` is revealed, and it's an extension placed on the path `0x00`.
+    - Child of the extension node at path `0x00` with the extension key `01` is revealed, and it's a branch node placed on the path `0x0001`.
+    - Child of the branch node at path `0x0001` under the nibble `1` is revealed, and it's a hash node placed on the path `0x00011`.
+    - Child of the branch node at path `0x0001` under the nibble `0` is revealed, and it's a leaf node placed on the path `0x00010`.
+
+
+For the implementation details, see [crates/trie/sparse/src/trie.rs](https://github.com/paradigmxyz/reth/blob/09a6aab9f7dc283e42fd00ce8f179542f8558580/crates/trie/sparse/src/trie.rs).
+
+### Sparse Trie updates
+
+![Sparse Trie Task](./mermaid/sparse-trie-task.mmd.png)
+
+The messages to the sparse trie are sent from the [State Root Task](#state-root-task),
+and consist of the proof that needs to be revealed, and a list of updates that need to be applied.
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L66-L74
+
+We do not reveal the proofs and apply the updates immediately,
+but instead accumulate them until the messages channel is empty, and then reveal and apply in bulk.
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L991-L994
+
+When messages are accumulated, we update the Sparse Trie:
+1. Reveal the proof
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1090-L1091
+2. For each modified storage trie, apply updates and calculate the roots in parallel
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1093
+3. Update accounts trie
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1133
+4. Calculate keccak hashes of the nodes below the certain level
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1139
+
+As you can see, we do not calculate the state root hash of the accounts trie
+(the one that will be the result of the whole task), but instead calculate only the certain hashes.
+
+This is an optimization that comes from the fact that we will likely update the top 2-3 levels of the trie
+in every transaction, so doing that work every time would be wasteful.
+
+Instead, we calculate hashes for most of the levels of the trie, and do the rest of the work
+only when we're finishing the calculation.
+
+### Finishing the calculation
+
+Once the messages channel is closed by the [State Root Task](#state-root-task),
+we exhaust it, reveal proofs and apply updates, and then calculate the full state root hash
+https://github.com/paradigmxyz/reth/blob/2ba54bf1c1f38c7173838f37027315a09287c20a/crates/engine/tree/src/tree/root.rs#L1014
+
+This state root is eventually sent as `StateRootMessage::RootCalculated` to the [Engine](#engine).