mirror of
https://github.com/hl-archive-node/nanoreth.git
synced 2025-12-06 10:59:55 +00:00
* Table design * v2 * v3 * Update database.md * Update docs/design/database.md Co-authored-by: Georgios Konstantopoulos <me@gakonst.com>
4.7 KiB
4.7 KiB
Database
Abstractions
- We created a Database trait abstraction using Rust Stable GATs which frees us from being bound to a single database implementation. We currently use MDBX, but are exploring redb as an alternative.
- We then iterated on
Transactionas a non-leaky abstraction with helpers for strictly-typed and unit-tested higher-level database abstractions.
Codecs
- We want Reth's serialized format to be able to trade off read/write speed for size, depending on who the user is.
- To achieve that, we created the Encode/Decode/Compress/Decompress trais to make the (de)serialization of database
Table::KeyandTable::Valuesgeneric.- This allows for out-of-the-box benchmarking (using Criterion and Iai)
- It also enables out-of-the-box fuzzing using trailofbits/test-fuzz.
- We implemented that trait for the following encoding formats:
- Ethereum-specific Compact Encoding: A lot of Ethereum datatypes have unnecessary zeros when serialized, or optional (e.g. on empty hashes) which would be nice not to pay in storage costs.
- Erigon achieves that by having a
bitfieldset on Table "PlainState which adds a bitfield to Accounts. - Akula expanded it for other tables and datatypes manually. It also saved some more space by storing the length of certain types (U256, u64) using the modular_bitfield crate, which compacts this information.
- We generalized it for all types, by writing a derive macro that autogenerates code for implementing the trait. It, also generates the interfaces required for fuzzing using ToB/test-fuzz:
- Erigon achieves that by having a
- Scale Encoding
- Postcard Encoding
- Passthrough (called
no_codecin the codebase)
- Ethereum-specific Compact Encoding: A lot of Ethereum datatypes have unnecessary zeros when serialized, or optional (e.g. on empty hashes) which would be nice not to pay in storage costs.
- We made implementation of these traits easy via a derive macro called
main_codecthat delegates to one of Compact (default), Scale, Postcard or Passthrough encoding. This is derived on every struct we need, and lets us experiment with different encoding formats without having to modify the entire codebase each time.
Table design
We do Transaction-granularity indexing. This means that we store the state for every account after every transaction that touched it, and we provide indexes for accessing that quickly. While this may make the database size bigger (and we need to benchmark this once we're closer to prod) it also enables blazing-fast historical tracing and simulations because we don't need to re-execute all transactions inside a block.
Below, you can see the table design that implements this scheme:
erDiagram
TransactionHash ||--o{ TxChangeIdIndex : index
BlockChangeIdIndex ||--o{ ChangeSet : "unique index"
History ||--o{ ChangeSet : index
TxChangeIdIndex ||--o{ ChangeSet : "unique index"
Transactions {
u64 TxNumber "PK"
Transaction Data
}
TransactionHash {
H256 TxHash "PK"
u64 TxNumber
}
TxChangeIdIndex {
u64 TxNumber "PK"
u64 ChangeId
}
BlockChangeIdIndex {
u64 BlockNumber "PK"
u64 ChangeId
}
ChangeSet {
u64 ChangeId "PK"
ChangeSet PreviousValues "[Acc1[Balance,Nonce),Acc2(Balance,Nonce)] Previous values"
}
History {
H256 Account "PK"
u64 ChangeIdList "[ChangeId,ChangeId,...] Points where account changed"
}
EVM ||--o{ History: "Load Account by finding first bigger ChangeId in List, and index it in ChangeSet table"
BlockChangeIdIndex ||--o{ EVM : "Use state (by block Changeid)"
TxChangeIdIndex ||--o{ EVM : "Use state (by tx ChangeId)"
TransactionHash ||--o{ Transactions : index