14 Commits

Author SHA1 Message Date
cee0a342dc Update README.md 2025-09-13 17:00:08 -04:00
eb2953246e Update README.md
Add support channel
2025-09-13 16:59:11 -04:00
fb462f2b24 Merge pull request #56 from jonathanudd/fix-start-at-in-s3sync-runner
fix: Skipping logic in s3sync-runner.sh which was based on lexicograp…
2025-09-08 09:54:39 -04:00
9af9c93454 fix: Skipping logic in s3sync-runner.sh which was based on lexicographic order instead of numeric order 2025-09-08 06:58:52 +02:00
8632f9d74f Merge pull request #37 from wwwehr/feature/hlfs-evm-blocks
Feature: Archive Node File Sharing
2025-08-30 00:00:08 +09:00
5baa5770ac Merge pull request #32 from wwwehr/main
Update testnet instructions on README
2025-08-29 12:50:07 +09:00
fa9f8fc5df Updated top level README.md
updated instructions and prefer the s3sync-runner tool
2025-08-28 11:34:18 -07:00
bf51dc83e5 incorporated s3 sync tool from external github repo 2025-08-28 11:27:31 -07:00
e9dcff4015 readme nits 2025-08-26 19:04:18 -07:00
21e7c718ea Update README.md
Co-authored-by: sprites0 <lovelysprites@gmail.com>
2025-08-26 19:02:19 -07:00
8c6ea1ae7a Update README.md
Co-authored-by: sprites0 <lovelysprites@gmail.com>
2025-08-26 19:01:41 -07:00
29c8d4fa39 Merge pull request #1 from wwwehr/fix/clarify-testnet-build
Update testnet instructions on README
2025-08-21 17:06:12 -07:00
5d3041b10d Update README.md 2025-08-21 16:24:25 -07:00
9f952ac2ed fix: Prevent excessive file crawling when syncing the first block 2025-08-20 21:49:17 -04:00
4 changed files with 260 additions and 4 deletions

View File

@ -2,6 +2,8 @@
Hyperliquid archive node based on [reth](https://github.com/paradigmxyz/reth).
Got questions? Drop by the [Hyperliquid Discord](https://discord.gg/hyperliquid) #node-operators channel.
## ⚠️ IMPORTANT: System Transactions Appear as Pseudo Transactions
Deposit transactions from `0x222..22` to user addresses are intentionally recorded as pseudo transactions.
@ -16,17 +18,35 @@ Building NanoReth from source requires Rust and Cargo to be installed:
## How to run (mainnet)
1) `$ aws s3 sync s3://hl-mainnet-evm-blocks/ ~/evm-blocks --request-payer requester # one-time` - this will backfill the existing blocks from HyperLiquid's EVM S3 bucket.
The current state of the block files comprise of millions of small objects totalling over 20 Gigs and counting. The "requester pays" option means you will need a configured aws environment, and you could incur charges which varies according to destination (ec2 versus local).
2) `$ make install` - this will install the NanoReth binary.
1) this will backfill the existing blocks from Hyperliquid's EVM S3 bucket:
3) Start NanoReth which will begin syncing using the blocks in `~/evm-blocks`:
> use our rust based s3 tool wrapper to optimize your download experience - [read more](./etc/evm-block-sync/README.md)
```shell
chmod +x ./etc/evm-block-sync/s3sync-runner.sh
./etc/evm-block-sync/s3sync-runner.sh
```
> or use the conventional [aws cli](https://aws.amazon.com/cli/)
```shell
aws s3 sync s3://hl-mainnet-evm-blocks/ ~/evm-blocks \
--request-payer requester \
--exact-timestamps \
--size-only \
--only-show-errors
```
1) `$ make install` - this will install the NanoReth binary.
2) Start NanoReth which will begin syncing using the blocks in `~/evm-blocks`:
```sh
$ reth node --http --http.addr 0.0.0.0 --http.api eth,ots,net,web3 --ws --ws.addr 0.0.0.0 --ws.origins '*' --ws.api eth,ots,net,web3 --ingest-dir ~/evm-blocks --ws.port 8545
```
4) Once the node logs stops making progress this means it's caught up with the existing blocks.
3) Once the node logs stops making progress this means it's caught up with the existing blocks.
Stop the NanoReth process and then start Goofys: `$ goofys --region=ap-northeast-1 --requester-pays hl-mainnet-evm-blocks evm-blocks`
@ -65,12 +85,25 @@ $ reth node --http --http.addr 0.0.0.0 --http.api eth,ots,net,web3 \
Testnet is supported since block 21304281.
> [!NOTE]
> To run testnet locally, you will need:
> - [ ] [git lfs](https://git-lfs.com/)
> - [ ] [rust toolchain](https://rustup.rs/)
```sh
# Get testnet genesis at block 21304281
$ cd ~
$ git clone https://github.com/sprites0/hl-testnet-genesis
$ git -C hl-testnet-genesis lfs pull
$ zstd --rm -d ~/hl-testnet-genesis/*.zst
# Now return to where you have cloned this project to continue
$ cd -
# prepare your rust toolchain
$ rustup install 1.82 # (this corresponds with rust version in our Cargo.toml)
$ rustup default 1.82
# Init node
$ make install
$ reth init-state --without-evm --chain testnet --header ~/hl-testnet-genesis/21304281.rlp \

View File

@ -277,6 +277,7 @@ impl BlockIngest {
let engine_api = node.auth_server_handle().http_client();
let mut evm_map = erc20_contract_to_spot_token(node.chain_spec().chain_id()).await?;
const MINIMUM_TIMESTAMP: u64 = 1739849780;
let current_block_timestamp: u64 = provider
.block_by_number(head)
.expect("Failed to fetch current block in db")
@ -284,6 +285,8 @@ impl BlockIngest {
.into_header()
.timestamp();
let current_block_timestamp = current_block_timestamp.max(MINIMUM_TIMESTAMP);
info!("Current height {height}, timestamp {current_block_timestamp}");
self.start_local_ingest_loop(height, current_block_timestamp).await;

View File

@ -0,0 +1,57 @@
# 🚀 S3Sync Runner
Fastest way to pull down evm block files from s3
This script automates syncing **massive S3 object stores** in a **safe, resumable, and time-tracked way**. The traditional `s3 sync` is just wayy to slow.
## Features
- ✅ Auto-installs [nidor1998/s3sync](https://github.com/nidor1998/s3sync) (latest release) into `~/.local/bin`
- ✅ Sequential per-prefix syncs (e.g., `21000000/`, `22000000/`, …)
- ✅ Per-prefix timing: `22000000 took 12 minutes!`
- ✅ Total runtime summary at the end
- ✅ Designed for **tiny files at scale** (EVM block archives)
- ✅ Zero-config bootstrap — just run the script
## Quick Start
```bash
chmod +x s3sync-runner.sh
./s3sync-runner.sh
```
> Skipping to relevant block section
```bash
./s3sync-runner.sh --start-at 30000000
```
The script will:
* Install or update s3sync into ~/.local/bin
* Discover top-level prefixes in your S3 bucket
* Sync them one at a time, printing elapsed minutes
## Configuration
Edit the top of s3sync-runner.sh if needed:
```bash
BUCKET="hl-testnet-evm-blocks" # could be hl-mainnet-evm-blocks
REGION="ap-northeast-1" # hardcoded bucket region
DEST="$HOME/evm-blocks-testnet" # local target directory (this is what nanoreth will look at)
WORKERS=512 # worker threads per sync (lotsa workers needs lotsa RAM)
```
## Example Output
```bash
[2025-08-20 20:01:02] START 21000000
[2025-08-20 20:13:15] 21000000 took 12 minutes!
[2025-08-20 20:13:15] START 22000000
[2025-08-20 20:26:40] 22000000 took 13 minutes!
[2025-08-20 20:26:40] ALL DONE in 25 minutes.
```
## Hackathon Context
This runner was built as part of the Hyperliquid DEX Hackathon to accelerate:
* ⛓️ Blockchain archive node ingestion
* 📂 EVM block dataset replication
* 🧩 DEX ecosystem data pipelines

View File

@ -0,0 +1,163 @@
#!/usr/bin/env bash
# @author Niko Wehr (wwwehr)
set -euo pipefail
# ---- config ----
BUCKET="hl-testnet-evm-blocks"
REGION="ap-northeast-1"
DEST="${HOME}/evm-blocks-testnet"
WORKERS=512
S3SYNC="${HOME}/.local/bin/s3sync"
START_AT="" # default: run all
CHUNK_SIZE=1000000 # each prefix represents this many blocks
# ----------------
# parse args
while [[ $# -gt 0 ]]; do
case "$1" in
--start-at)
START_AT="$2"
shift 2
;;
*)
echo "Unknown arg: $1" >&2
exit 1
;;
esac
done
now(){ date +"%F %T"; }
log(){ printf '[%s] %s\n' "$(now)" "$*"; }
die(){ log "ERROR: $*"; exit 1; }
trap 'log "Signal received, exiting."; exit 2' INT TERM
need(){ command -v "$1" >/dev/null 2>&1 || die "missing dependency: $1"; }
install_s3sync_latest() {
need curl
GHAPI="https://api.github.com/repos/nidor1998/s3sync/releases/latest"
os="$(uname | tr '[:upper:]' '[:lower:]')"
arch_raw="$(uname -m)"
case "$arch_raw" in
x86_64|amd64) arch_tag="x86_64" ;;
aarch64|arm64) arch_tag="aarch64" ;;
*) die "unsupported arch: ${arch_raw}" ;;
esac
# Map OS → asset prefix
case "$os" in
linux) prefix="s3sync-linux-glibc2.28-${arch_tag}" ;;
darwin) prefix="s3sync-macos-${arch_tag}" ;;
msys*|mingw*|cygwin*|windows) prefix="s3sync-windows-${arch_tag}" ;;
*) die "unsupported OS: ${os}" ;;
esac
# Fetch latest release JSON (unauthenticated)
json="$(curl -fsSL "$GHAPI")" || die "failed to query GitHub API"
# Pick URLs for tarball and checksum
tar_url="$(printf '%s' "$json" | awk -F'"' '/browser_download_url/ {print $4}' | grep -F "${prefix}.tar.gz" | head -n1)"
sum_url="$(printf '%s' "$json" | awk -F'"' '/browser_download_url/ {print $4}' | grep -F "${prefix}.sha256" | head -n1)"
[[ -n "$tar_url" ]] || die "could not find asset for prefix: ${prefix}"
[[ -n "$sum_url" ]] || die "could not find checksum for prefix: ${prefix}"
mkdir -p "${HOME}/.local/bin"
tmpdir="$(mktemp -d)"; trap 'rm -rf "$tmpdir"' EXIT
tar_path="${tmpdir}/s3sync.tar.gz"
sum_path="${tmpdir}/s3sync.sha256"
log "Downloading: $tar_url"
curl -fL --retry 5 --retry-delay 1 -o "$tar_path" "$tar_url"
curl -fL --retry 5 --retry-delay 1 -o "$sum_path" "$sum_url"
# Verify checksum
want_sum="$(cut -d: -f2 <<<"$(sed -n 's/^sha256:\(.*\)$/\1/p' "$sum_path" | tr -d '[:space:]')" || true)"
[[ -n "$want_sum" ]] || want_sum="$(awk '{print $1}' "$sum_path" || true)"
[[ -n "$want_sum" ]] || die "could not parse checksum file"
got_sum="$(sha256sum "$tar_path" | awk '{print $1}')"
[[ "$want_sum" == "$got_sum" ]] || die "sha256 mismatch: want $want_sum got $got_sum"
# Extract and install
tar -xzf "$tar_path" -C "$tmpdir"
binpath="$(find "$tmpdir" -maxdepth 2 -type f -name 's3sync' | head -n1)"
[[ -x "$binpath" ]] || die "s3sync binary not found in archive"
chmod +x "$binpath"
mv -f "$binpath" "$S3SYNC"
log "s3sync installed at $S3SYNC"
}
# --- deps & install/update ---
need aws
install_s3sync_latest
[[ ":$PATH:" == *":$HOME/.local/bin:"* ]] || export PATH="$HOME/.local/bin:$PATH"
mkdir -p "$DEST"
# list prefixes
log "Listing top-level prefixes in s3://${BUCKET}/"
mapfile -t PREFIXES < <(
aws s3 ls "s3://${BUCKET}/" --region "$REGION" --request-payer requester \
| awk '/^ *PRE /{print $2}' | sed 's:/$::' | grep -E '^[0-9]+$' || true
)
((${#PREFIXES[@]})) || die "No prefixes found."
# sort numerically to make order predictable
IFS=$'\n' read -r -d '' -a PREFIXES < <(printf '%s\n' "${PREFIXES[@]}" | sort -n && printf '\0')
# compute the effective start prefix:
# - if START_AT is set, floor it to the containing chunk boundary
effective_start=""
if [[ -n "$START_AT" ]]; then
# numeric, base-10 safe
start_num=$((10#$START_AT))
chunk=$((10#$CHUNK_SIZE))
effective_start=$(( (start_num / chunk) * chunk ))
fi
# mark initial status using numeric comparisons (no ordering assumptions)
declare -A RESULTS
for p in "${PREFIXES[@]}"; do
if [[ -n "$effective_start" ]] && (( 10#$p < 10#$effective_start )); then
RESULTS["$p"]="-- SKIPPED"
else
RESULTS["$p"]="-- TODO"
fi
done
total_start=$(date +%s)
for p in "${PREFIXES[@]}"; do
if [[ "${RESULTS[$p]}" == "-- SKIPPED" ]]; then
continue
fi
src="s3://${BUCKET}/${p}/"
dst="${DEST}/${p}/"
mkdir -p "$dst"
log "START ${p}"
start=$(date +%s)
"$S3SYNC" \
--source-request-payer \
--source-region "$REGION" \
--worker-size "$WORKERS" \
--max-parallel-uploads "$WORKERS" \
"$src" "$dst"
end=$(date +%s)
mins=$(( (end - start + 59) / 60 ))
RESULTS["$p"]="$mins minutes"
# Print status table so far
echo "---- Status ----"
for k in "${PREFIXES[@]}"; do
echo "$k ${RESULTS[$k]}"
done
echo "----------------"
done
total_end=$(date +%s)
total_mins=$(( (total_end - total_start + 59) / 60 ))
echo "ALL DONE in $total_mins minutes."