mirror of
https://github.com/hl-archive-node/nanoreth.git
synced 2025-12-06 10:59:55 +00:00
180 lines
6.7 KiB
Markdown
180 lines
6.7 KiB
Markdown
# Troubleshooting
|
|
|
|
This page tries to answer how to deal with the most popular issues.
|
|
|
|
## Database
|
|
|
|
### Slow database inserts and updates
|
|
|
|
If you're:
|
|
1. Running behind the tip
|
|
2. Have slow canonical commit time according to the `Canonical Commit Latency time` chart on [Grafana dashboard](./observability.md#prometheus--grafana) (more than 2-3 seconds)
|
|
3. Seeing warnings in your logs such as
|
|
```console
|
|
2023-11-08T15:17:24.789731Z WARN providers::db: Transaction insertion took too long block_number=18528075 tx_num=2150227643 hash=0xb7de1d6620efbdd3aa8547c47a0ff09a7fd3e48ba3fd2c53ce94c6683ed66e7c elapsed=6.793759034s
|
|
```
|
|
|
|
then most likely you're experiencing issues with the [database freelist](https://github.com/paradigmxyz/reth/issues/5228).
|
|
To confirm it, check if the values on the `Freelist` chart on [Grafana dashboard](./observability.md#prometheus--grafana)
|
|
is greater than 10M.
|
|
|
|
Currently, there are two main ways to fix this issue.
|
|
|
|
|
|
#### Compact the database
|
|
It will take around 5-6 hours and require **additional** disk space located on the same or different drive
|
|
equal to the [freshly synced node](../installation/installation.md#hardware-requirements).
|
|
|
|
1. Clone Reth
|
|
```bash
|
|
git clone https://github.com/paradigmxyz/reth
|
|
cd reth
|
|
```
|
|
2. Build database debug tools
|
|
```bash
|
|
make db-tools
|
|
```
|
|
3. Run compaction (this step will take 5-6 hours, depending on the I/O speed)
|
|
```bash
|
|
./db-tools/mdbx_copy -c $(reth db path) reth_compact.dat
|
|
```
|
|
4. Stop Reth
|
|
5. Backup original database
|
|
```bash
|
|
mv $(reth db path)/mdbx.dat reth_old.dat
|
|
```
|
|
6. Move compacted database in place of the original database
|
|
```bash
|
|
mv reth_compact.dat $(reth db path)/mdbx.dat
|
|
```
|
|
7. Start Reth
|
|
8. Confirm that the values on the `Freelist` chart is near zero and the values on the `Canonical Commit Latency time` chart
|
|
is less than 1 second.
|
|
9. Delete original database
|
|
```bash
|
|
rm reth_old.dat
|
|
```
|
|
|
|
#### Re-sync from scratch
|
|
It will take the same time as initial sync.
|
|
|
|
1. Stop Reth
|
|
2. Drop the database using [`reth db drop`](../cli/reth/db/drop.md)
|
|
3. Start reth
|
|
|
|
### Database write error
|
|
|
|
If you encounter an irrecoverable database-related errors, in most of the cases it's related to the RAM/NVMe/SSD you use. For example:
|
|
```console
|
|
Error: A stage encountered an irrecoverable error.
|
|
|
|
Caused by:
|
|
0: An internal database error occurred: Database write error code: -30796
|
|
1: Database write error code: -30796
|
|
```
|
|
|
|
or
|
|
|
|
```console
|
|
Error: A stage encountered an irrecoverable error.
|
|
|
|
Caused by:
|
|
0: An internal database error occurred: Database read error code: -30797
|
|
1: Database read error code: -30797
|
|
```
|
|
|
|
1. Check your memory health: use [memtest86+](https://www.memtest.org/) or [memtester](https://linux.die.net/man/8/memtester). If your memory is faulty, it's better to resync the node on different hardware.
|
|
2. Check database integrity:
|
|
```bash
|
|
git clone https://github.com/paradigmxyz/reth
|
|
cd reth
|
|
make db-tools
|
|
./db-tools/mdbx_chk $(reth db path)/mdbx.dat | tee mdbx_chk.log
|
|
```
|
|
If `mdbx_chk` has detected any errors, please [open an issue](https://github.com/paradigmxyz/reth/issues) and post the output from the `mdbx_chk.log` file.
|
|
|
|
### Concurrent database access error (using containers/Docker)
|
|
|
|
If you encounter an error while accessing the database from multiple processes and you are using multiple containers or a mix of host and container(s), it is possible the error is related to `PID` namespaces. You might see one of the following error messages.
|
|
|
|
```console
|
|
mdbx:0: panic: Assertion `osal_rdt_unlock() failed: err 1' failed.
|
|
```
|
|
or
|
|
|
|
```console
|
|
pthread_mutex_lock.c:438: __pthread_mutex_lock_full: Assertion `e != ESRCH || !robust' failed
|
|
```
|
|
|
|
If you are using Docker, a possible solution is to run all database-accessing containers with `--pid=host` flag.
|
|
|
|
For more information, check out the `Containers` section in the [libmdbx README](https://github.com/erthink/libmdbx#containers).
|
|
|
|
## Hardware Performance Testing
|
|
|
|
If you're experiencing degraded performance, it may be related to hardware issues. Below are some tools and tests you can run to evaluate your hardware performance.
|
|
|
|
If your hardware performance is significantly lower than these reference numbers, it may explain degraded node performance. Consider upgrading your hardware or investigating potential issues with your current setup.
|
|
|
|
### Disk Speed Testing with [IOzone](https://linux.die.net/man/1/iozone)
|
|
|
|
1. Test disk speed:
|
|
```bash
|
|
iozone -e -t1 -i0 -i2 -r1k -s1g /tmp
|
|
```
|
|
Reference numbers (on Latitude c3.large.x86):
|
|
|
|
```console
|
|
Children see throughput for 1 initial writers = 907733.81 kB/sec
|
|
Parent sees throughput for 1 initial writers = 907239.68 kB/sec
|
|
Children see throughput for 1 rewriters = 1765222.62 kB/sec
|
|
Parent sees throughput for 1 rewriters = 1763433.35 kB/sec
|
|
Children see throughput for 1 random readers = 1557497.38 kB/sec
|
|
Parent sees throughput for 1 random readers = 1554846.58 kB/sec
|
|
Children see throughput for 1 random writers = 984428.69 kB/sec
|
|
Parent sees throughput for 1 random writers = 983476.67 kB/sec
|
|
```
|
|
2. Test disk speed with memory-mapped files:
|
|
```bash
|
|
iozone -B -G -e -t1 -i0 -i2 -r1k -s1g /tmp
|
|
```
|
|
Reference numbers (on Latitude c3.large.x86):
|
|
|
|
```console
|
|
Children see throughput for 1 initial writers = 56471.06 kB/sec
|
|
Parent sees throughput for 1 initial writers = 56365.14 kB/sec
|
|
Children see throughput for 1 rewriters = 241650.69 kB/sec
|
|
Parent sees throughput for 1 rewriters = 239067.96 kB/sec
|
|
Children see throughput for 1 random readers = 6833161.00 kB/sec
|
|
Parent sees throughput for 1 random readers = 5597659.65 kB/sec
|
|
Children see throughput for 1 random writers = 220248.53 kB/sec
|
|
Parent sees throughput for 1 random writers = 219112.26 kB/sec
|
|
```
|
|
|
|
### RAM Speed and Health Testing
|
|
|
|
1. Check RAM speed with [lshw](https://linux.die.net/man/1/lshw):
|
|
```bash
|
|
sudo lshw -short -C memory
|
|
```
|
|
Look for the frequency in the output. Reference output:
|
|
|
|
```console
|
|
H/W path Device Class Description
|
|
================================================================
|
|
/0/24/0 memory 64GiB DIMM DDR4 Synchronous Registered (Buffered) 3200 MHz (0.3 ns)
|
|
/0/24/1 memory 64GiB DIMM DDR4 Synchronous Registered (Buffered) 3200 MHz (0.3 ns)
|
|
...
|
|
```
|
|
|
|
2. Test RAM health with [memtester](https://linux.die.net/man/8/memtester):
|
|
```bash
|
|
sudo memtester 10G
|
|
```
|
|
This will take a while. You can test with a smaller amount first:
|
|
|
|
```bash
|
|
sudo memtester 1G 1
|
|
```
|
|
All checks should report "ok".
|