From 5d3041b10d5567b8acc1b0bc1d5a6d8d8e44c267 Mon Sep 17 00:00:00 2001 From: Nicholas Wehr <33910651+wwwehr@users.noreply.github.com> Date: Thu, 21 Aug 2025 16:24:25 -0700 Subject: [PATCH 1/6] Update README.md --- README.md | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index eac81548e..1ffc8c00d 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,19 @@ Building NanoReth from source requires Rust and Cargo to be installed: ## How to run (mainnet) -1) `$ aws s3 sync s3://hl-mainnet-evm-blocks/ ~/evm-blocks --request-payer requester # one-time` - this will backfill the existing blocks from HyperLiquid's EVM S3 bucket. +The current state of the block files comprise of millions of small objects totalling over 20 Gigs and counting. The "requester pays" option means you will need a configured aws environment, and you could incur charges which varies according to destination (ec2 versus local). + +1) this will backfill the existing blocks from HyperLiquid's EVM S3 bucket: + + ```shell + aws s3 sync s3://hl-mainnet-evm-blocks/ ~/evm-blocks \ + --request-payer requester \ + --exact-timestamps \ + --size-only \ + --page-size 1000 \ + --only-show-errors + ``` + > consider using this [rust based s3 tool wrapper](https://github.com/wwwehr/hl-evm-block-sync) alternative to optimize your download experience 2) `$ make install` - this will install the NanoReth binary. @@ -65,12 +77,25 @@ $ reth node --http --http.addr 0.0.0.0 --http.api eth,ots,net,web3 \ Testnet is supported since block 21304281. +> [!NOTE] +> To run testnet locally, you will need: +> - [ ] [git lfs](https://git-lfs.com/) +> - [ ] [rust toolchain](https://rustup.rs/) + ```sh # Get testnet genesis at block 21304281 $ cd ~ $ git clone https://github.com/sprites0/hl-testnet-genesis +$ git lfs pull $ zstd --rm -d ~/hl-testnet-genesis/*.zst +# Now return to where you have cloned this project to continue +$ cd - + +# prepare your rust toolchain +$ rustup install 1.82 # (this corresponds with rust version in our Cargo.toml) +$ rustup default 1.82 + # Init node $ make install $ reth init-state --without-evm --chain testnet --header ~/hl-testnet-genesis/21304281.rlp \ From 8c6ea1ae7a19e4c50c5293c1e2e2f3c5881c8f4b Mon Sep 17 00:00:00 2001 From: Nicholas Wehr <33910651+wwwehr@users.noreply.github.com> Date: Tue, 26 Aug 2025 19:01:41 -0700 Subject: [PATCH 2/6] Update README.md Co-authored-by: sprites0 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1ffc8c00d..13a4aa6f4 100644 --- a/README.md +++ b/README.md @@ -86,7 +86,7 @@ Testnet is supported since block 21304281. # Get testnet genesis at block 21304281 $ cd ~ $ git clone https://github.com/sprites0/hl-testnet-genesis -$ git lfs pull +$ git -C hl-testnet-genesis lfs pull $ zstd --rm -d ~/hl-testnet-genesis/*.zst # Now return to where you have cloned this project to continue From 21e7c718eaa65f998cb7e50123e0b9cfe3563e83 Mon Sep 17 00:00:00 2001 From: Nicholas Wehr <33910651+wwwehr@users.noreply.github.com> Date: Tue, 26 Aug 2025 19:02:19 -0700 Subject: [PATCH 3/6] Update README.md Co-authored-by: sprites0 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 13a4aa6f4..09cf61cfe 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ Building NanoReth from source requires Rust and Cargo to be installed: The current state of the block files comprise of millions of small objects totalling over 20 Gigs and counting. The "requester pays" option means you will need a configured aws environment, and you could incur charges which varies according to destination (ec2 versus local). -1) this will backfill the existing blocks from HyperLiquid's EVM S3 bucket: +1) this will backfill the existing blocks from Hyperliquid's EVM S3 bucket: ```shell aws s3 sync s3://hl-mainnet-evm-blocks/ ~/evm-blocks \ From e9dcff401568e0fded54a46bd847d5ad7a38fbc9 Mon Sep 17 00:00:00 2001 From: Nicholas Wehr Date: Tue, 26 Aug 2025 19:04:18 -0700 Subject: [PATCH 4/6] readme nits --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 09cf61cfe..b0ffae45e 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,6 @@ The current state of the block files comprise of millions of small objects total --request-payer requester \ --exact-timestamps \ --size-only \ - --page-size 1000 \ --only-show-errors ``` > consider using this [rust based s3 tool wrapper](https://github.com/wwwehr/hl-evm-block-sync) alternative to optimize your download experience From bf51dc83e59d0163bdbe3757075d69958374a642 Mon Sep 17 00:00:00 2001 From: Nicholas Wehr Date: Thu, 28 Aug 2025 11:27:31 -0700 Subject: [PATCH 5/6] incorporated s3 sync tool from external github repo --- etc/evm-block-sync/README.md | 57 ++++++++++ etc/evm-block-sync/s3sync-runner.sh | 158 ++++++++++++++++++++++++++++ 2 files changed, 215 insertions(+) create mode 100644 etc/evm-block-sync/README.md create mode 100755 etc/evm-block-sync/s3sync-runner.sh diff --git a/etc/evm-block-sync/README.md b/etc/evm-block-sync/README.md new file mode 100644 index 000000000..ffb7cecb0 --- /dev/null +++ b/etc/evm-block-sync/README.md @@ -0,0 +1,57 @@ +# 🚀 S3Sync Runner + +Fastest way to pull down evm block files from s3 + +This script automates syncing **massive S3 object stores** in a **safe, resumable, and time-tracked way**. The traditional `s3 sync` is just wayy to slow. + +## Features + +- ✅ Auto-installs [nidor1998/s3sync](https://github.com/nidor1998/s3sync) (latest release) into `~/.local/bin` +- ✅ Sequential per-prefix syncs (e.g., `21000000/`, `22000000/`, …) +- ✅ Per-prefix timing: `22000000 took 12 minutes!` +- ✅ Total runtime summary at the end +- ✅ Designed for **tiny files at scale** (EVM block archives) +- ✅ Zero-config bootstrap — just run the script + +## Quick Start + +```bash +chmod +x s3sync-runner.sh +./s3sync-runner.sh +``` + +> Skipping to relevant block section +```bash +./s3sync-runner.sh --start-at 30000000 +``` + +The script will: +* Install or update s3sync into ~/.local/bin +* Discover top-level prefixes in your S3 bucket +* Sync them one at a time, printing elapsed minutes + +## Configuration + +Edit the top of s3sync-runner.sh if needed: +```bash +BUCKET="hl-testnet-evm-blocks" # could be hl-mainnet-evm-blocks +REGION="ap-northeast-1" # hardcoded bucket region +DEST="$HOME/evm-blocks-testnet" # local target directory (this is what nanoreth will look at) +WORKERS=512 # worker threads per sync (lotsa workers needs lotsa RAM) +``` + +## Example Output +```bash +[2025-08-20 20:01:02] START 21000000 +[2025-08-20 20:13:15] 21000000 took 12 minutes! +[2025-08-20 20:13:15] START 22000000 +[2025-08-20 20:26:40] 22000000 took 13 minutes! +[2025-08-20 20:26:40] ALL DONE in 25 minutes. +``` + +## Hackathon Context + +This runner was built as part of the Hyperliquid DEX Hackathon to accelerate: +* ⛓️ Blockchain archive node ingestion +* 📂 EVM block dataset replication +* 🧩 DEX ecosystem data pipelines diff --git a/etc/evm-block-sync/s3sync-runner.sh b/etc/evm-block-sync/s3sync-runner.sh new file mode 100755 index 000000000..ae99654b4 --- /dev/null +++ b/etc/evm-block-sync/s3sync-runner.sh @@ -0,0 +1,158 @@ +#!/usr/bin/env bash +# @author Niko Wehr (wwwehr) +set -euo pipefail + +# ---- config ---- +BUCKET="hl-testnet-evm-blocks" +REGION="ap-northeast-1" +DEST="${HOME}/evm-blocks-testnet" +WORKERS=512 +S3SYNC="${HOME}/.local/bin/s3sync" +START_AT="" # default: run all +# ---------------- + +# parse args +while [[ $# -gt 0 ]]; do + case "$1" in + --start-at) + START_AT="$2" + shift 2 + ;; + *) + echo "Unknown arg: $1" >&2 + exit 1 + ;; + esac +done + +now(){ date +"%F %T"; } +log(){ printf '[%s] %s\n' "$(now)" "$*"; } +die(){ log "ERROR: $*"; exit 1; } +trap 'log "Signal received, exiting."; exit 2' INT TERM + +need(){ command -v "$1" >/dev/null 2>&1 || die "missing dependency: $1"; } + +install_s3sync_latest() { + need curl + GHAPI="https://api.github.com/repos/nidor1998/s3sync/releases/latest" + + os="$(uname | tr '[:upper:]' '[:lower:]')" + arch_raw="$(uname -m)" + case "$arch_raw" in + x86_64|amd64) arch_tag="x86_64" ;; + aarch64|arm64) arch_tag="aarch64" ;; + *) die "unsupported arch: ${arch_raw}" ;; + esac + + # Map OS → asset prefix + case "$os" in + linux) prefix="s3sync-linux-glibc2.28-${arch_tag}" ;; + darwin) prefix="s3sync-macos-${arch_tag}" ;; + msys*|mingw*|cygwin*|windows) prefix="s3sync-windows-${arch_tag}" ;; + *) die "unsupported OS: ${os}" ;; + esac + + # Fetch latest release JSON (unauthenticated) + json="$(curl -fsSL "$GHAPI")" || die "failed to query GitHub API" + + # Pick URLs for tarball and checksum + tar_url="$(printf '%s' "$json" | awk -F'"' '/browser_download_url/ {print $4}' | grep -F "${prefix}.tar.gz" | head -n1)" + sum_url="$(printf '%s' "$json" | awk -F'"' '/browser_download_url/ {print $4}' | grep -F "${prefix}.sha256" | head -n1)" + [[ -n "$tar_url" ]] || die "could not find asset for prefix: ${prefix}" + [[ -n "$sum_url" ]] || die "could not find checksum for prefix: ${prefix}" + + mkdir -p "${HOME}/.local/bin" + tmpdir="$(mktemp -d)"; trap 'rm -rf "$tmpdir"' EXIT + tar_path="${tmpdir}/s3sync.tar.gz" + sum_path="${tmpdir}/s3sync.sha256" + + log "Downloading: $tar_url" + curl -fL --retry 5 --retry-delay 1 -o "$tar_path" "$tar_url" + curl -fL --retry 5 --retry-delay 1 -o "$sum_path" "$sum_url" + + # Verify checksum + want_sum="$(cut -d: -f2 <<<"$(sed -n 's/^sha256:\(.*\)$/\1/p' "$sum_path" | tr -d '[:space:]')" || true)" + [[ -n "$want_sum" ]] || want_sum="$(awk '{print $1}' "$sum_path" || true)" + [[ -n "$want_sum" ]] || die "could not parse checksum file" + got_sum="$(sha256sum "$tar_path" | awk '{print $1}')" + [[ "$want_sum" == "$got_sum" ]] || die "sha256 mismatch: want $want_sum got $got_sum" + + # Extract and install + tar -xzf "$tar_path" -C "$tmpdir" + binpath="$(find "$tmpdir" -maxdepth 2 -type f -name 's3sync' | head -n1)" + [[ -x "$binpath" ]] || die "s3sync binary not found in archive" + chmod +x "$binpath" + mv -f "$binpath" "$S3SYNC" + log "s3sync installed at $S3SYNC" +} + + +# --- deps & install/update --- +need aws +install_s3sync_latest +[[ ":$PATH:" == *":$HOME/.local/bin:"* ]] || export PATH="$HOME/.local/bin:$PATH" +mkdir -p "$DEST" + +# list prefixes +log "Listing top-level prefixes in s3://${BUCKET}/" +mapfile -t PREFIXES < <( + aws s3 ls "s3://${BUCKET}/" --region "$REGION" --request-payer requester \ + | awk '/^ *PRE /{print $2}' | sed 's:/$::' | grep -E '^[0-9]+$' || true +) +((${#PREFIXES[@]})) || die "No prefixes found." + +# mark initial status +declare -A RESULTS +if [[ ! -n "$START_AT" ]]; then + skipping=0 +else + skipping=1 +fi +for p in "${PREFIXES[@]}"; do + if [[ -n "$START_AT" && "$p" == "$START_AT" ]]; then + skipping=0 + fi + if (( skipping )); then + RESULTS["$p"]="-- SKIPPED" + else + RESULTS["$p"]="-- TODO" + fi +done + +total_start=$(date +%s) + +for p in "${PREFIXES[@]}"; do + if [[ "${RESULTS[$p]}" == "-- SKIPPED" ]]; then + continue + fi + src="s3://${BUCKET}/${p}/" + dst="${DEST}/${p}/" + mkdir -p "$dst" + + log "START ${p}" + start=$(date +%s) + + "$S3SYNC" \ + --source-request-payer \ + --source-region "$REGION" \ + --worker-size "$WORKERS" \ + --max-parallel-uploads "$WORKERS" \ + "$src" "$dst" + + end=$(date +%s) + mins=$(( (end - start + 59) / 60 )) + RESULTS["$p"]="$mins minutes" + + # Print status table so far + echo "---- Status ----" + for k in "${PREFIXES[@]}"; do + echo "$k ${RESULTS[$k]}" + done + echo "----------------" +done + +total_end=$(date +%s) +total_mins=$(( (total_end - total_start + 59) / 60 )) + +echo "ALL DONE in $total_mins minutes." + From fa9f8fc5df042b4e06084b728a3c3aea75552578 Mon Sep 17 00:00:00 2001 From: Nicholas Wehr <33910651+wwwehr@users.noreply.github.com> Date: Thu, 28 Aug 2025 11:34:18 -0700 Subject: [PATCH 6/6] Updated top level README.md updated instructions and prefer the s3sync-runner tool --- README.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index b0ffae45e..3eaa99ff1 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,13 @@ The current state of the block files comprise of millions of small objects total 1) this will backfill the existing blocks from Hyperliquid's EVM S3 bucket: + > use our rust based s3 tool wrapper to optimize your download experience - [read more](./etc/evm-block-sync/README.md) + ```shell + chmod +x ./etc/evm-block-sync/s3sync-runner.sh + ./etc/evm-block-sync/s3sync-runner.sh + ``` + + > or use the conventional [aws cli](https://aws.amazon.com/cli/) ```shell aws s3 sync s3://hl-mainnet-evm-blocks/ ~/evm-blocks \ --request-payer requester \ @@ -27,17 +34,17 @@ The current state of the block files comprise of millions of small objects total --size-only \ --only-show-errors ``` - > consider using this [rust based s3 tool wrapper](https://github.com/wwwehr/hl-evm-block-sync) alternative to optimize your download experience -2) `$ make install` - this will install the NanoReth binary. -3) Start NanoReth which will begin syncing using the blocks in `~/evm-blocks`: +1) `$ make install` - this will install the NanoReth binary. + +2) Start NanoReth which will begin syncing using the blocks in `~/evm-blocks`: ```sh $ reth node --http --http.addr 0.0.0.0 --http.api eth,ots,net,web3 --ws --ws.addr 0.0.0.0 --ws.origins '*' --ws.api eth,ots,net,web3 --ingest-dir ~/evm-blocks --ws.port 8545 ``` -4) Once the node logs stops making progress this means it's caught up with the existing blocks. +3) Once the node logs stops making progress this means it's caught up with the existing blocks. Stop the NanoReth process and then start Goofys: `$ goofys --region=ap-northeast-1 --requester-pays hl-mainnet-evm-blocks evm-blocks`