diff --git a/docs/roadmaps/snap_sync_roadmap.md b/docs/roadmaps/snap_sync_roadmap.md new file mode 100644 index 00000000000..84de643b9d2 --- /dev/null +++ b/docs/roadmaps/snap_sync_roadmap.md @@ -0,0 +1,1333 @@ +# Snap Sync Module Roadmap + +**Author:** Pablo Deymonnaz (original), ElFantasma (ongoing updates) +**Date:** February 2026 (last updated 2026-04-15) +**Status:** Draft for Review + +--- + +## Executive Summary + +This roadmap outlines a strategic plan to improve the ethrex snap sync module in three phases: + +1. **Phase 1: Performance Optimization** - Make snap sync as fast as possible +2. **Phase 2: Code Quality & Maintainability** - Make the code clear, readable, and easier to understand +3. **Phase 3: Pipeline Architecture** - Migrate to spawned actors for pipelined concurrent execution + +The snap sync module currently comprises ~4,900 lines across 11 files. Our goal is to achieve sync times competitive with geth while maintaining code quality standards. + +> **April 2026 update (initial):** Spawned 0.5.0 has been [merged](#6295) — the actor framework blocker for Phase 3 is gone. Several new performance PRs (#6410, #6184, #6177, #6159, #6178) have been opened since the original roadmap. Phase 1 now includes trie building optimizations that represent the largest single improvement opportunity (-31% account insertion time). Phase 3 pipelining has been partially achieved without actors (#6184), validating the incremental approach. + +> **2026-04-15 update:** Multiple workstreams advanced significantly: +> +> - **Reliability:** Discovered and diagnosed three compounding bugs causing ~20% of mainnet snap syncs to crash on pivot update (peer rotation, weight function, BlockRangeUpdate filtering — see Issue #6474). Quick fix shipped as PR #6475; proper multi-bug fix tracked as #6474. +> - **Observability:** PR #6470 opened adding `admin_syncStatus` / `admin_peerScores` RPC endpoints, live `peer_top.py` TUI, Grafana dashboards, and monitor improvements. Tooling was used to produce the forensic analysis for the reliability work. +> - **Profiling-driven perf candidates:** Mainnet profiling of `insert_storages` revealed that **~80% of idle thread-seconds** come from small-account dispatcher overhead, not from the monster account. Two new issues opened: +> - **#6476** small-account batching — amortize dispatcher overhead for the 26M <1ms trie builds +> - **#6477** big-account parallelization in snap sync — analogous to PR #6410 but for large storage tries (realistic gain ~5-6%, smaller than initially thought) +> - **Pipelining context added** to Issue #4240 (Phase 3 spawned rewrite) with a concrete proposal: ~13% gain from phase overlap (Accounts → Storage + Bytecodes → Healing pipelined via actors). +> - **New roadmap items for reliability** (see §1.18 and §1.19 below). + +--- + +## Table of Contents + +1. [Current State Analysis](#current-state-analysis) +2. [Phase 1: Performance Optimization](#phase-1-performance-optimization) +3. [Phase 2: Code Quality & Maintainability](#phase-2-code-quality--maintainability) +4. [Phase 3: Pipeline Architecture](#phase-3-pipeline-architecture) +5. [Success Metrics](#success-metrics) +6. [Risk Assessment](#risk-assessment) +7. [Timeline](#timeline) +8. [Dependencies](#dependencies) + +--- + +## Current State Analysis + +### Module Structure + +| File | Lines | Purpose | +|------|-------|---------| +| `snap/client.rs` | 1,401 | Client-side snap protocol requests | +| `sync/snap_sync.rs` | 1,181 | Main snap sync orchestration | +| `sync/healing/storage.rs` | 740 | Storage trie healing | +| `sync/healing/state.rs` | 463 | State trie healing | +| `sync/full.rs` | 297 | Full sync implementation | +| `sync.rs` | 290 | Module root: `Syncer`, `AccountStorageRoots`, `SyncError` | +| `snap/server.rs` | 166 | Server-side snap protocol responses | +| `snap/error.rs` | 147 | Unified error types | +| `snap/constants.rs` | 121 | Protocol constants | +| `sync/code_collector.rs` | 100 | Bytecode collection | +| **Total** | **~4,906** | | + +### Snap Sync Phases + +The snap sync process consists of 6 sequential phases: + +``` +┌──────────────────────────────────────────────────────────────────────────┐ +│ SNAP SYNC PIPELINE │ +├──────────────────────────────────────────────────────────────────────────┤ +│ │ +│ 1. Header Download ──► 2. Pivot Selection ──► 3. Account Range Download │ +│ │ │ +│ ▼ │ +│ 6. Full Sync ◄── 5. Bytecode Download ◄── 4. Storage Range Download │ +│ │ │ +│ ▼ │ +│ [State Healing & Storage Healing run in parallel with phases 4-5] │ +│ │ +└──────────────────────────────────────────────────────────────────────────┘ +``` + +### Current Performance Bottlenecks + +Based on code analysis and mainnet profiling data (PR #6410; April 2026 profiling run `20260412_172457`): + +| Bottleneck | Location | Impact | Priority | +|------------|----------|--------|----------| +| **Pivot update crashes** | `update_pivot` in `snap_sync.rs` | **~20% of mainnet runs crash** with `process::exit(2)` + DB corruption — requires full resync | **Critical (reliability)** | +| **Trie building in insertion** | `insert_accounts`, `insert_storage` | **75-91% of insertion time** (883s/1184s for accounts, 2357s/2587s for storage on mainnet) | **Critical** | +| **Small-account dispatcher overhead** | `insert_storages` dispatcher | **~80% of the 49% idle thread-seconds** (≈14,915 thread-seconds on a 2347s mainnet storage phase). 26.3M accounts × avg <1ms, dispatcher blocked 69% of wall on slot turnover | **Critical** (newly quantified) | +| **Large storage tries run single-threaded** | `insert_storages` per-account task | Monster (Uniswap-class, 159.6M leaves) runs 244.9s on 1 thread; ~20% of idle thread-seconds | **High** | +| Sequential header download | `sync_cycle_snap()` | Blocks state download start | Critical | +| Sequential phase pipeline | `snap_sync.rs` orchestration | Bytecodes/storage wait for all accounts to finish | High | +| Redundant code hash pass | `insert_accounts` | Extra full iteration over temp DB (addressed in #6410) | Medium | +| Trie node batching | `heal_state_trie()`, `heal_storage_trie()` | Writes are batched but could use `put_batch_no_alloc` | Medium | +| Busy-wait loops | Multiple locations | CPU waste (only when no peers available) | Medium | +| SST file intermediate step | Account/storage download | Overlapping key ranges force RocksDB merge during ingestion | Medium | +| Sync `std::fs` calls | Snapshot dumping | Already in `spawn_blocking`, but directory ops should use `tokio::fs` | Low | + +### Existing Code Quality Issues + +| Issue | Location | Description | +|-------|----------|-------------| +| `#[allow(clippy::too_many_arguments)]` | `heal_state_trie()`, `process_node_responses()` | 8+ parameters - needs context struct | +| Repeated code patterns | `snap/client.rs` | Snapshot dumping logic duplicated | +| Magic numbers | Various | Hardcoded values without constants | +| Missing documentation | Healing modules | Complex algorithms undocumented | +| Inconsistent error handling | Various | Mix of `?`, `.expect()`, silent drops | + +--- + +## Phase 1: Performance Optimization + +### Goal +Reduce snap sync time by 50% or more through parallelization, batching optimizations, and I/O improvements. + +--- + +### 1.1 Parallel Header Download (PR #6059 - In Progress) + +**Current State:** Headers are downloaded sequentially before state download begins. + +**Proposed Change:** Download headers in a background task while state download proceeds in parallel. + +**Implementation:** +- Add `header_receiver` channel to `SnapBlockSyncState` +- Spawn `download_headers_background()` task +- Process headers incrementally at strategic points +- Add early abort mechanism when switching to full sync + +**Expected Impact:** +- State download starts immediately instead of waiting for millions of headers +- Estimated time savings: 10-20% of total sync time + +**Status:** PR #6059 open, addressing review feedback + +--- + +### 1.2 ~~Parallel Account Range Requests~~ (Discarded) + +> Discarded — not needed after profiling. + +--- + +### 1.3 Optimize Trie Node Batching + +**Current State:** +- `NODE_BATCH_SIZE = 500` nodes per request +- `STORAGE_BATCH_SIZE = 300` accounts per batch +- DB writes use `put_batch()` (already batched) + +**Proposed Changes:** + +#### 1.3.1 Use `put_batch_no_alloc()` for Healing +```rust +// Current (sync/healing/state.rs:302, sync/healing/storage.rs:231) +// PERF: use put_batch_no_alloc (note that it needs to remove nodes too) + +// Proposed: Pre-allocate buffers, reuse across batches +struct HealingBatchWriter { + node_buffer: Vec<(Nibbles, Node)>, + capacity: usize, +} +``` + +#### 1.3.2 Dynamic Batch Sizing (Needs Measurement) +Adjust batch sizes based on: +- Available memory +- Peer response latency +- Current healing progress + +**Note:** Impact on healing duration needs empirical measurement — current batching may already be sufficient. Should only pursue if benchmarks show batch sizing is a bottleneck. + +**Expected Impact:** Needs measurement + +**Effort:** Medium (2 weeks) + +--- + +### 1.4 Reduce Busy-Wait Loops (Issue #6140 — Step 9) + +**Current State:** Multiple locations use `try_recv()` + `tokio::time::sleep()` in loops: +- `request_account_range()` (`snap/client.rs:193`): 10ms sleep waiting for peers +- `request_bytecodes()` (`snap/client.rs:383`): 10ms sleep waiting for peers +- `request_storage_ranges()` (`snap/client.rs:646`): 10ms sleep waiting for peers +- `heal_state_trie()` (`sync/healing/state.rs:151`): `try_recv` polling +- `heal_storage_trie()` (`sync/healing/storage.rs:261`): `try_recv` polling + +Note: busy-waits only trigger when no peers are available. + +**Proposed Change:** Replace with proper async primitives: + +```rust +// Current (snap_sync.rs ~line 452) +loop { + if let Some(headers) = block_sync_state.header_receiver.try_recv() { ... } + tokio::time::sleep(Duration::from_millis(100)).await; +} + +// Proposed: Blocking receive with timeout +match tokio::time::timeout( + Duration::from_secs(30), + block_sync_state.header_receiver.recv() +).await { + Ok(Some(headers)) => { ... } + Ok(None) => break, // Channel closed + Err(_) => continue, // Timeout, check staleness +} +``` + +**Expected Impact:** Reduced CPU usage, faster response to events + +**Effort:** Low (1 week) + +--- + +### 1.5 ~~Memory-Bounded Structures~~ (Discarded) + +> Discarded — not a real bottleneck in practice. + +--- + +### 1.6 Use `tokio::fs` for Directory Operations + +**Current State:** Snapshot dumping is already inside `spawn_blocking`. However, directory creation and existence checks still use synchronous `std::fs` calls. + +**Proposed Change:** Replace `std::fs` directory operations with `tokio::fs`: + +```rust +// Current +std::fs::create_dir_all(dir)?; + +// Proposed +tokio::fs::create_dir_all(dir).await?; +``` + +**Expected Impact:** Minor — main I/O is already non-blocking via `spawn_blocking`. + +**Effort:** Low (< 1 week) + +--- + +### 1.7 Peer Connection Optimization + +**Current State:** +- `PEER_REPLY_TIMEOUT = 15 seconds` +- `MAX_IN_FLIGHT_REQUESTS = 77` +- No adaptive timeout based on peer performance + +**Proposed Changes:** + +#### 1.7.1 Adaptive Timeouts +```rust +struct AdaptivePeerConfig { + base_timeout: Duration, + peer_latencies: HashMap, + + fn timeout_for_peer(&self, peer_id: &H256) -> Duration { + self.peer_latencies + .get(peer_id) + .map(|avg| avg.mean() * 3.0) // 3x average latency + .unwrap_or(self.base_timeout) + } +} +``` + +#### 1.7.2 Request Pipelining +Increase in-flight requests for high-quality peers: +```rust +fn max_requests_for_peer(&self, peer_id: &H256) -> u32 { + match self.peer_quality(peer_id) { + PeerQuality::Excellent => 100, + PeerQuality::Good => 77, + PeerQuality::Average => 50, + PeerQuality::Poor => 20, + } +} +``` + +**Expected Impact:** 20-30% improvement in peer utilization (needs empirical measurement) + +**Note:** Could introduce excessive complexity for marginal gain. Should only pursue if benchmarks show peer utilization is actually a bottleneck. + +**Effort:** Medium (2 weeks) + +--- + +### 1.8 ~~Parallel Storage Healing~~ (Discarded) + +> Discarded — storage healing is already parallelized via `JoinSet` with up to `MAX_IN_FLIGHT_REQUESTS` (77) concurrent requests. + +### 1.9 ~~Bytes for Trie Values — O(1) Clones~~ (Discarded) + +> Discarded — PR #6057 closed without merging. + +### 1.10 Snap Sync Benchmark Tool (PR #6108 - In Progress) + +Python tool (`tooling/sync/sync_benchmark.py`) to analyze snap sync performance from container logs, identifying bottlenecks per phase. + +### 1.11 Per-Phase Timing Breakdown in Slack Notifications (✅ DONE — Merged in #6136) + +Surfaces per-phase completion timings (Block Headers, Account Ranges, Storage Ranges, Healing, etc.) directly in Slack notifications from the multisync monitoring script, so performance bottlenecks are visible at a glance. + +--- + +### 1.12 Optimize Trie Building in Snap Sync Insertion (PR #6410 — In Progress) + +**Current State:** Trie building is 75-91% of insertion time (profiled on mainnet). Account insertion took ~20 minutes and storage insertion ~43 minutes — together 61% of total snap sync time. The trie build is entirely CPU-bound (0% I/O wait). + +**Changes:** +1. **Eliminate redundant code hash iteration pass** — original code iterated all accounts twice (once for code hashes, once for trie). Merged into single pass. +2. **Reuse `nodehash_buffer` across calls** — avoids ~700M allocations on mainnet. +3. **Parallel state trie building across 16 nibble ranges** — splits state trie into 16 independent sub-tries built concurrently. + +**Benchmarks (mainnet, release profile, no validation):** + +| Phase | Before | After | Delta | +|-------|--------|-------|-------| +| Account Insertion | 1184s (19m 44s) | 818s (13m 40s) | **-31%** | +| Storage Insertion | 2587s (43m 7s) | 2433s (40m 30s) | **-6%** | + +| Network | Before | After | Saved | +|---------|--------|-------|-------| +| Hoodi | ~25m | ~13m | ~12m | +| Sepolia | ~90m | ~43m | ~47m | +| Mainnet | ~1h 42m | ~1h 35m | ~7m | + +**Note:** This optimization is orthogonal to the concurrency model — it operates at the trie-building level, below the orchestration layer. No dependency on actor migration. + +**Status:** PR #6410 open, benchmarked on mainnet + +--- + +### 1.13 Pipeline Bytecode Downloads & Background Storage Healing (PR #6184 — In Progress) + +**Current State:** Bytecodes wait for all healing; storage healing must reach 100% before finalization. + +**Changes:** +1. **Concurrent bytecode downloads** — stream code hashes via `mpsc` channel to a concurrent download task running alongside healing. Content-addressed (hash = key), safe regardless of pivot changes. +2. **Background storage healing with 99% threshold** — state healing runs to 100%, but storage healing finalizes at 99% and completes the remaining <1% in a background task after finalization. + +**Benchmarks (Hoodi):** -13% total sync time (489s vs 564s baseline). + +**Note:** Implements Phase 3 pipelining goals (3.3, 3.4) incrementally using `mpsc` channels, without requiring a full actor migration. Validates that the orchestration can be improved without restructuring to actors first. + +**Status:** PR #6184 open + +--- + +### 1.14 Eliminate SST File Intermediate Step (PR #6177 — In Progress) + +Replace SST file writer + ingest flow with direct `WriteBatch` writes to temp RocksDB during download phase. Removes overlapping key range merge overhead. Also merges the two iterator passes in `insert_accounts` into a single pass. + +**Status:** PR #6177 open, needs mainnet testing + +--- + +### 1.15 Optimize Insertion and Healing Write Paths (PR #6159 — In Progress) + +Addresses multiple hot-path bottlenecks found via profiling: +- Single-element `Vec` alloc per key in `put_batch` (~20k allocs per flush) +- `put_batch_no_alloc` actually allocates (encodes all nodes, collects into Vec) +- BTreeMap overwrites in healing batch writes +- Hardcoded 12 threads for trie building + +**Status:** PR #6159 open + +--- + +### 1.16 Disable WAL and Improve Concurrency (PR #6178 — In Progress) + +Disable write-ahead log for snap sync temp DBs (crash recovery not needed) and tune RocksDB concurrency settings. + +**Status:** PR #6178 open + +--- + +### 1.17 Fill All Peer Slots per Tick in Healing Dispatch (PR #6175 — In Progress) + +Current healing dispatch only fills one peer slot per loop iteration. Fill all available slots per tick to maximize network utilization. + +**Status:** PR #6175 open + +--- + +### 1.18 Snap Sync Observability Tooling (PR #6470 — In Progress, added 2026-04-15) + +**Current State:** Diagnosing snap sync failures required manual log grep and docker inspection. No runtime visibility into peer scores, sync phase, or request distribution. + +**Changes:** +- `admin_syncStatus` RPC endpoint — live sync phase, pivot block, progress metrics, error history +- `admin_peerScores` RPC endpoint — per-peer scores, inflight request counts, capabilities, last BlockRangeUpdate, eligibility +- `admin_setLogLevel` RPC — dynamically raise to TRACE during incidents +- `tooling/sync/peer_top.py` — live TUI showing peer scores, request distribution, and selection patterns in real time +- Grafana dashboard panels: sync progress, peer scoring distribution, request rates, per-phase rate overview +- Docker monitor improvements: rolling snapshots, degradation detection, 5s polling, force-dump on failure +- Header-download diagnostics logging in `snap_sync.rs` + +**Impact:** No direct sync-time impact. Enables diagnosis of every other reliability/perf issue. The forensic analysis for §1.19 (pivot-update crashes) was only possible because of this tooling. + +**Status:** PR #6470 open + +--- + +### 1.19 Pivot Update Reliability (PRs/Issues #6475, #6474 — added 2026-04-15) + +**Current State:** `update_pivot` crashes the node (`process::exit(2)`) on ~20% of mainnet sync runs. The exit leaves the DB in an inconsistent state (`Unknown state found in DB`), requiring a full `removedb` and resync from scratch. + +**Root causes** (full forensics in Issue #6474): + +- **Bug A — `update_pivot` classified as irrecoverable.** Commit `583795955` changed the retry loop from infinite to `MAX_TOTAL_FAILURES=15` with exponential backoff. 15 failures exhaust in ~2-3 min, then `PeerHandlerError::BlockHeaders` → classified irrecoverable → `process::exit(2)`. +- **Bug B — Deterministic peer selection never rotates.** `get_best_peer` + `.max_by_key(weight_peer)` always returns the same top-scored peer. The weight function `score - inflight_requests` systematically prefers idle/incapable peers (e.g., eth/70-only erigon with 0 snap requests, score 47) over busy healthy peers (Geth with 47 snap requests, weight 3). +- **Bug C — `BlockRangeUpdate.range_to` unused in selection.** Peers advertise their chain tip; we store it but don't filter by it. On hoodi we kept asking a peer for pivot block 2593928 while its last BlockRangeUpdate said range_to=2593886 (42 blocks behind). + +**Evidence:** Hoodi failure `run_20260411_033943` — only 2 distinct peers tried across 9 attempts; at least 6 other peers had the block. Mainnet failure `run_20260414_011127` — stuck on erigon/v3.5.0-dev with weight 47 while 11 healthy peers had weight 3. + +**Quick fix (PR #6475 — shipped):** +- Reclassify `PeerHandler`/`NoBlockHeaders` errors as recoverable (narrow per-variant via `PeerHandlerError::is_recoverable()`) +- Add `get_best_peer_excluding(caps, excluded)` — rotation-aware peer selection +- Replace `MAX_TOTAL_FAILURES=15` with `MAX_ROTATIONS=5` (scales with peer count) +- Catch recoverable errors inside retry loop so protocol errors advance rotation + +**Proper fix (Issue #6474 — deferred):** tackle the deeper peer-selection bugs: +1. Fix `weight_peer` for control-plane requests (pivot update, header resolution) — don't penalize data-plane inflight +2. Filter peers by `BlockRangeUpdate.range_to` before selection +3. Broaden rotation across all eligible peers +4. Don't count passive waits as failures +5. Revert irrecoverable classification of post-pivot-header fetch failures +6. Raise/remove `MAX_TOTAL_FAILURES` +7. Cherry-pick fixes from `fullsync-acceleration` branches (880244afe, efaa344d4) +8. DB cleanup / graceful shutdown on sync failure + +**Impact:** Removes a reliability failure that was ~20% of mainnet runs. Currently masks optimization progress (slower runs → more pivot updates → more exposure to the bug). + +**Status:** PR #6475 open (AI agent feedback addressed 2026-04-15); Issue #6474 open as follow-up. + +--- + +### 1.20 Within-Trie Parallelization for Large Storage Tries (Issue #6477 — added 2026-04-15) + +**Current State:** `insert_storages` is parallel across accounts (16 worker threads, one account per task), but each individual account's trie is built single-threaded. For large storage tries (Uniswap-class, 159M+ leaves), this means a single thread runs for ~245s while the monster is processed. + +**Proposed Change:** Apply `trie_from_sorted_parallel` (from PR #6410) *inside* each large storage task, splitting the trie build across 16 storage-slot-nibble ranges. + +**Distinction from #5482:** Issue #5482 (existing, open) addresses the same idea but for *block execution* — parallelizing per-tx storage updates during state-root computation. This issue (#6477) is the analogous change for snap sync's initial trie construction. + +**Expected Impact:** ~100-150s saved (~5-6% of storage phase). Smaller than originally projected because the other 15 threads aren't actually idle during the monster's solo run (they're working on the long tail). + +**Status:** Issue #6477 open + +--- + +### 1.21 Small-Account Batching in insert_storages (Issue #6476 — added 2026-04-15) + +**Current State:** `insert_storages` profiling shows only 8.1 of 16 threads used on average (**49% of thread-seconds idle**, 18,589 / 37,554). Decomposition: +- Monster account serialization: ~20% of idle time +- **Small-account dispatcher overhead: ~80% of idle time** — 26.3M accounts with avg <1ms trie build each, dispatcher blocked 69% of wall on slot turnover + +**Proposed Change:** Bundle N small accounts per worker job to amortize send/reap/slot-free overhead. Large accounts still run as single tasks. + +**Expected Impact:** Potentially 5-15 min off the 39-min storage phase (30-40%). This is the dominant parallelism killer and **higher-value than #6477** (big-account). + +**Architectural dependencies:** None. Change is inside the existing dispatcher loop — independent of the spawned migration (Phase 3) and of #6477 (complementary, additive). + +**Status:** Issue #6476 open + +--- + +### 1.22 Decoded `TrieLayerCache` (PR #6348 — In Progress, added to roadmap 2026-04-15) + +**Current State:** `TrieLayerCache` hits still go through `Node::decode()` even when the node was just cached — decoded representation is discarded and re-derived on every access. + +**Proposed Change:** Cache the decoded `Node` value alongside the encoded bytes. Skip `Node::decode()` on cache hits. + +**Expected Impact:** TBD — needs benchmark. Hot path in trie traversal; decode is not free. + +**Status:** PR #6348 open (author: Arkenan) + +--- + +### 1.23 Bloom Filter for Non-Existent Storage Slots (PR #6288 — In Progress, added to roadmap 2026-04-15) + +**Current State:** Storage trie seeks are issued for every slot read, including for accounts that have no storage or slots that don't exist. On large contracts, many lookups miss. + +**Proposed Change:** Add a bloom filter to skip trie seeks for slots known not to exist. + +**Expected Impact:** TBD — needs benchmark. Could help both sync-time trie seeks and runtime reads. + +**Status:** PR #6288 open (author: ilitteri) + +--- + +### 1.24 Adaptive Request Sizing & Storage Bisection (PR #6181 — In Progress, added to roadmap 2026-04-15) + +**Current State:** Storage range requests use fixed size per request. Adaptive sizing based on peer response history and bisection on oversized responses could improve throughput. + +**Proposed Change:** Adaptive request sizing + storage bisection on oversized responses + parallel trie construction for storage. + +**Status:** PR #6181 open (author: ilitteri) + +--- + +### 1.25 Concurrent Bytecode + Storage Download (PR #6205 — In Progress, added to roadmap 2026-04-15) + +**Current State:** Bytecodes download as a distinct phase after storage. Pipeline opportunity similar to what PR #6184 did for bytecodes + healing. + +**Proposed Change:** Run bytecode downloads concurrently with storage downloads. + +**Note:** May overlap partially with PR #6184 (which concurrently runs bytecodes with *healing*). Needs review to reconcile. + +**Status:** PR #6205 open (author: ilitteri) + +--- + +### 1.26 Phase Completion Markers for Validation (PR #6189 — In Progress, added to roadmap 2026-04-15) + +**Current State:** No persisted markers for phase completion; recovery and validation tooling has to infer progress. + +**Proposed Change:** Add phase completion markers to the snap sync validation flow. + +**Status:** PR #6189 open (author: ilitteri) + +--- + +## Phase 2: Code Quality & Maintainability + +### Goal +Make the codebase clear, well-documented, and easy for new contributors to understand. + +--- + +### 2.1 Extract Context Structs (Issue #6140 — Steps 5, 6) + +**Also includes:** `AccountStorageRoots` simplification (defined in `sync.rs:167`) — replace `BTreeMap, Vec<(H256, H256)>)>` with SoA approach and named struct instead of tuple. Named structs for channel types in `sync/healing/state.rs` and worker return types in `snap/client.rs`. + +**Current State:** Functions with many parameters: +```rust +#[allow(clippy::too_many_arguments)] +async fn heal_state_trie( + state_root: H256, + store: Store, + mut peers: PeerHandler, + staleness_timestamp: u64, + global_leafs_healed: &mut u64, + mut healing_queue: StateHealingQueue, + storage_accounts: &mut AccountStorageRoots, + code_hash_collector: &mut CodeHashCollector, +) -> Result +``` + +**Proposed Change:** +```rust +struct StateHealingContext { + state_root: H256, + store: Store, + staleness_timestamp: u64, +} + +struct StateHealingProgress { + global_leafs_healed: u64, + healing_queue: StateHealingQueue, + storage_accounts: AccountStorageRoots, + code_hash_collector: CodeHashCollector, +} + +async fn heal_state_trie( + ctx: &StateHealingContext, + peers: &mut PeerHandler, + progress: &mut StateHealingProgress, +) -> Result +``` + +**Files Affected:** +- `sync/healing/state.rs` (`#[allow(clippy::too_many_arguments)]` at line 76) +- `sync/healing/storage.rs` (`#[allow(clippy::too_many_arguments)]` at line 519) +- `sync/snap_sync.rs` +- `sync.rs` (`AccountStorageRoots` at line 167) + +**Effort:** Low (1 week) + +--- + +### 2.2 Comprehensive Documentation + +**Current State:** Module documentation is sparse; healing algorithms are complex and undocumented. + +**Proposed Documentation:** + +#### 2.2.1 Architecture Documentation +Create `docs/snap_sync_architecture.md`: +- High-level overview with diagrams +- Data flow through the system +- State machine for sync phases +- Interaction with storage layer + +#### 2.2.2 Algorithm Documentation +Document healing algorithms inline: +```rust +/// # State Trie Healing Algorithm +/// +/// The healing process fixes inconsistencies in the state trie that occur +/// when snap sync spans multiple pivot blocks. +/// +/// ## Algorithm +/// +/// 1. Start from the state root node +/// 2. For each node, check if all children exist in local storage +/// 3. For missing children: +/// a. Add to download queue +/// b. Request from peers in batches of NODE_BATCH_SIZE +/// 4. When a node's children are all present, flush to DB +/// 5. Repeat until no missing nodes remain +/// +/// ## Invariants +/// +/// - Parent nodes are only flushed after all children are present +/// - The healing queue tracks `missing_children_count` per node +/// - Staleness checks prevent infinite loops on changing state +/// +/// ## Complexity +/// +/// - Time: O(n) where n is the number of trie nodes +/// - Space: O(d * b) where d is max depth and b is branching factor +``` + +#### 2.2.3 Inline Code Comments +Add comments explaining non-obvious logic, especially: +- Hash boundary calculations +- Pivot staleness detection +- Proof verification + +**Effort:** Medium (2 weeks) + +--- + +### 2.3 Consolidate Error Handling (Merged — PR #5975) + +--- + +### 2.4 Extract Helper Functions (Issue #6140 — Steps 3, 4) + +**Current State:** Duplicated patterns identified: +- Snapshot dumping (4 occurrences) - **Partially addressed** +- Peer selection and retry logic (6+ occurrences) +- Progress reporting (5+ occurrences) + +**Proposed Change:** Create shared utilities: + +```rust +// snap/utils.rs (new file) + +/// Retries an operation with exponential backoff +pub async fn with_retry( + max_attempts: u32, + operation: F, +) -> Result +where + F: Fn() -> Fut, + Fut: Future>, +{ ... } + +/// Reports progress at regular intervals +pub struct ProgressReporter { + interval: Duration, + last_report: Instant, + metrics_key: &'static str, +} + +impl ProgressReporter { + pub fn maybe_report(&mut self, current: u64, total: u64) { ... } +} +``` + +**Effort:** Low (1 week) + +--- + +### 2.5 State Machine Refactor (Consider deferring — subsumed by 3.1) + +> **Note:** This would be subsumed by the actor migration (3.1), where each actor naturally represents a phase with explicit state. Consider skipping this as a standalone item if 3.1 is planned soon. + +**Current State:** Snap sync phases are implicit in control flow. + +**Proposed Change:** Make phases explicit: + +```rust +#[derive(Debug, Clone, Copy, PartialEq)] +pub enum SnapSyncPhase { + Initializing, + DownloadingHeaders, + DownloadingAccounts, + DownloadingStorages, + DownloadingBytecodes, + HealingState, + HealingStorage, + FullSync, + Complete, +} + +pub struct SnapSyncStateMachine { + phase: SnapSyncPhase, + progress: SnapSyncProgress, + + pub fn transition(&mut self, event: SnapSyncEvent) -> Result<(), SyncError> { + match (self.phase, event) { + (DownloadingHeaders, HeadersComplete) => { + self.phase = DownloadingAccounts; + Ok(()) + } + // ... other transitions + _ => Err(SyncError::InvalidStateTransition), + } + } +} +``` + +**Benefits:** +- Clear phase boundaries +- Easier to add new phases +- Better logging and metrics +- Simpler testing + +**Effort:** High (3 weeks) + +--- + +### 2.6 Test Coverage Improvement + +**Current State:** +- 12 snap server tests +- Limited client-side testing +- No integration tests for full sync cycle + +**Proposed Testing Strategy:** + +#### 2.6.1 Unit Tests +- Test each healing algorithm with mock data +- Test pivot selection logic +- Test proof verification + +#### 2.6.2 Integration Tests +- Mock peer network +- Test full sync cycle with small state +- Test pivot updates mid-sync +- Test recovery from interrupted sync + +#### 2.6.3 Property-Based Tests +- Random account ranges +- Random trie structures +- Fuzz testing for proof verification + +**Target Coverage:** 80%+ for core modules + +**Effort:** High (4 weeks) + +--- + +### 2.7 Configuration Externalization + +**Current State:** Constants are hardcoded in `snap/constants.rs`. + +**Proposed Change:** Make tunable parameters configurable: + +```rust +// config/snap_sync.rs +#[derive(Debug, Clone, Deserialize)] +pub struct SnapSyncConfig { + /// Maximum response size in bytes (default: 512KB) + pub max_response_bytes: u64, + + /// Number of accounts per request (default: 128) + pub snap_limit: usize, + + /// In-memory buffer size before disk flush (default: 64MB) + pub range_file_chunk_size: usize, + + /// Maximum concurrent in-flight requests (default: 77) + pub max_in_flight_requests: u32, + + // ... other parameters +} + +impl Default for SnapSyncConfig { + fn default() -> Self { + Self { + max_response_bytes: 512 * 1024, + snap_limit: 128, + // ... defaults from current constants + } + } +} +``` + +**Benefits:** +- Tuning without recompilation +- Environment-specific configurations +- Easier benchmarking + +**Effort:** Medium (2 weeks) + +--- + +### 2.8 Fix Correctness Bugs in `request_storage_ranges` (Issue #6140 — Steps 1, 2) + +**Current State:** One location crashes the node on a recoverable error: +- `panic!("Should have found the account hash")` (`snap/client.rs:729`) + +Other `.expect()` calls (`snap/client.rs:630-631`) are on `JoinSet` results, not store lookups. + +**Proposed Change:** Replace with proper error propagation using `SnapError::InternalError` and `?` operator. + +**Effort:** Very low (< 1 day) + +--- + +### 2.9 Fix Snap Protocol Capability Bug — ✅ DONE + +**Status:** Merged in #5975. All `get_best_peer()` calls now use `SUPPORTED_SNAP_CAPABILITIES`. + +--- + +### 2.10 Add `spawn_blocking` to Bytecodes Handler — ✅ DONE + +**Status:** Merged in #5975. Bytecodes handler in `snap/server.rs` uses `spawn_blocking`, matching the pattern of all other handlers. + +--- + +### 2.11 Remove Dead `DumpError.contents` Field — ✅ DONE + +**Status:** Merged in #5975. `DumpError` in `snap/error.rs` no longer has the `contents` field and uses `#[derive(Debug, thiserror::Error)]` instead of a custom `Debug` impl. + +--- + +### 2.12 Use `JoinSet` Instead of Channels for Workers (Consider deferring — subsumed by 3.1) + +> **Note:** The actor migration (3.1) would replace both channels and JoinSets with actor messages. Consider skipping this intermediate step if 3.1 is planned soon. + +**Current State:** Both `request_account_range` (`snap/client.rs:138`) and `request_storage_ranges` (`snap/client.rs:587`) use `mpsc::channel` for worker communication. If a worker panics, the message is lost silently and the main loop may hang waiting for results. + +**Proposed Change:** Migrate to `tokio::task::JoinSet` which propagates panics and handles task lifecycle. The bytecodes path already uses `JoinSet` as a reference. + +**Effort:** Medium (1-2 weeks) + +--- + +### 2.13 Self-Contained `StorageTask` with Hashes + +**Current State:** `StorageTask` in `snap/client.rs:75` references `accounts_by_root_hash` by index (`start_index`, `end_index`). Any mutation of the vector would silently corrupt in-flight tasks. The task is not self-contained. + +**Proposed Change:** Include actual account hashes and storage roots in `StorageTask` instead of indices. This makes tasks self-contained and eliminates the implicit coupling to the vector. + +**Effort:** Medium (1 week) + +--- + +### 2.14 Move Snap Client Methods Off `PeerHandler` — ✅ DONE + +**Status:** Merged in #5975. Snap client methods extracted from `peer_handler.rs` to `snap/client.rs` as standalone functions taking `peers: &mut PeerHandler`. + +--- + +### 2.15 Guard `write_set` in Account Path + +**Current State:** `request_account_range` (`snap/client.rs:170`) spawns disk-write tasks without checking if one is already pending. The storage path (`snap/client.rs:625`) already does `!disk_joinset.is_empty()` check. Missing the guard can lead to multiple concurrent writes. + +**Proposed Change:** Add the same `!disk_joinset.is_empty()` guard to the account range disk write path, matching the storage path pattern. + +**Effort:** Very low (small) + +--- + +### 2.16 Healing Code Unification + +**Current State:** `sync/healing/state.rs` (~463 lines) and `sync/healing/storage.rs` (~740 lines) implement the same trie healing algorithm. Differences: path representation (single vs double nibbles) and leaf type (accounts vs U256). Lots of duplicated logic. + +**Proposed Change:** Extract a generic healing function parameterized by path and leaf type. Both modules would call into the shared implementation. + +**Effort:** High (3+ weeks) + +--- + +### 2.17 Use Existing Constants for Magic Numbers + +**Current State:** Most magic numbers have been replaced with named constants: +- `STORAGE_BATCH_SIZE` (used at `snap/client.rs:567`) +- `HASH_MAX` (used at `snap/client.rs:821,856,1275`) +- `ACCOUNT_RANGE_CHUNK_COUNT` (used at `snap/client.rs:109`) + +Remaining: channel capacity `1000` appears at lines 138, 370, 587 — could be a named constant. + +**Effort:** Very low (trivial) + +--- + +### 2.18 Storage Download Refactor via `StorageTrieTracker` (PR #6171 — In Progress, added 2026-04-15) + +**Current State:** `AccountStorageRoots` tracks storage downloads per-account with complex index-based referencing into `accounts_by_root_hash`. Tasks reference accounts by index, results carry index ranges, big-account promotion mutates intervals — hard to follow and brittle. + +**Proposed Change:** New `StorageTrieTracker` groups storage tries by root hash from the start, separating small (single-request) from big (multi-request) tries. Moves trie data into tasks and back in results, eliminating index-based coupling. + +**Relation:** Complementary to #6140 (same file, orthogonal concerns). This is the data-ownership refactor; #6140 is the readability/correctness cleanup. + +**Status:** PR #6171 open (author: fedacking) + +--- + +### Issue #6140 — Refactor `request_storage_ranges` (Steps Summary) + +9-step plan to refactor `request_storage_ranges` in `snap/client.rs`. Each step is one independently correct commit. Full details in [Issue #6140](https://github.com/lambdaclass/ethrex/issues/6140). + +| Step | Description | Sections | Risk | +|------|-------------|----------|------| +| 1 | Replace `panic!` with proper error return | 2.8 | Very low | +| 2 | Replace `.expect()` with `?` operator | 2.8 | Very low | +| 3 | Extract `ensure_snapshot_dir` helper (4 occurrences) | 2.4 | Very low | +| 4 | Extract big-account chunking helper — DRY (~70 dup lines) | 2.4 | Low | +| 5 | Introduce `TaskTracker` struct for task counting | 2.1 | Very low | +| 6 | Extract result processing into `StorageDownloadState.process_result()` | 2.1 | Medium | +| 7 | Track buffer size incrementally (O(1) instead of O(n)) | — | Low | +| 8 | Remove `accounts_done` HashMap (inline removal) | — | Low | +| 9 | Replace busy-poll (`try_recv` + `sleep`) with `tokio::select!` | 1.4 | Medium | + +**Dependencies:** +``` +Steps 1, 2, 3, 5 — independent +Step 4 — independent +Step 6 — depends on Steps 4, 5 +Steps 7, 8 — depend on Step 6 +Step 9 — depends on Step 6 +``` + +**Execution order:** 1 → 2 → 3 → 5 → 4 → 6 → 7 → 8 → 9 + +--- + +## Phase 3: Pipeline Architecture + +### Goal +Replace the sequential sync phase model with a pipelined actor architecture using Spawned, enabling concurrent execution of phases that are currently blocked on each other. + +### Context + +The rest of the p2p layer already uses `spawned_concurrency` actors (`ActorRef`, `#[protocol]`, `#[actor]`) for peer table, discovery, RLPx connections, and tx broadcasting. The snap sync module is the main holdout — it uses raw `tokio::spawn` + `mpsc::channel` for concurrency. + +**Update (April 2026):** The spawned 0.5.0 migration has been merged (PR #6295, March 31). All existing actors across ethrex now use the new `#[protocol]` + `#[actor]` macro API. The spawned framework blocker is resolved — Phase 3 can proceed whenever the team is ready. + +Additionally, PR #6184 has demonstrated that key pipelining goals (concurrent bytecodes, background storage healing) can be achieved incrementally with `mpsc` channels, without a full actor rewrite. This validates an incremental approach: land targeted optimizations first, then migrate to actors for cleaner architecture. + +### Motivation + +Phases 1 and 2 optimize individual sync stages but don't challenge the fundamental sequential pipeline: + +``` +Headers → Pivot → Accounts → Storage → Bytecodes → Healing → Full Sync +``` + +In practice, once an account batch is downloaded, its storage and bytecodes are immediately known — there's no reason to wait for ALL accounts before starting storage/bytecode downloads. Similarly, headers can be fetched in the background while state download proceeds. The current code can't express this easily because each phase is a monolithic function that must complete before the next starts. + +With actors, each phase becomes a message-driven pipeline stage: + +``` + ┌──────────────┐ + │ HeaderActor │ (background, feeds into FullSync later) + └──────────────┘ + +┌──────────────┐ ┌───────────────┐ ┌───────────────┐ +│ AccountActor │───►│ StorageActor │───►│ HealingActor │ +│ │───►│ BytecodeActor │ │ │ +└──────────────┘ └───────────────┘ └───────────────┘ +``` + +Each actor owns its own state, peer management, and timing — no shared mutable state or busy-wait coordination. + +--- + +### 3.1 Migrate Snap Sync to Spawned Actors + +**Current State:** `snap_sync.rs` orchestrates everything via sequential function calls. Workers are spawned with `tokio::spawn` and communicate via `mpsc::channel`. Peer acquisition uses `try_recv` + sleep busy-wait loops. + +**Proposed Change:** Define a `#[protocol]` for each sync stage and implement them as `Actor`s: + +```rust +#[protocol] +pub trait AccountDownloaderProtocol: Send + Sync { + fn download_range(&self, start: H256, end: H256) -> Result<(), ActorError>; + fn account_batch_ready(&self, accounts: Vec) -> Result<(), ActorError>; +} + +#[actor(protocol = AccountDownloaderProtocol)] +impl AccountDownloader { + #[started] + async fn started(&mut self, ctx: &Context) { ... } + + async fn handle_download_range(&mut self, start: H256, end: H256) { ... } +} +``` + +**Actors:** +- `HeaderActor` — downloads headers in background, notifies orchestrator on completion +- `AccountActor` — downloads account ranges, sends each batch downstream immediately +- `StorageActor` — receives account batches, starts storage downloads per-batch +- `BytecodeActor` — receives code hashes from account batches, downloads in parallel +- `HealingActor` — starts healing as soon as enough state is available +- `SyncOrchestrator` — coordinates state machine transitions and pivot updates + +**Subsumes:** 1.1 (parallel headers), 2.12 (JoinSet → actors), 2.5 (state machine refactor) + +**Effort:** High (6+ weeks) + +--- + +### 3.2 Peer Scoring + +**Current State:** `get_best_peer()` selects peers by capability match and in-flight request count. There's no tracking of peer throughput, latency, or reliability beyond simple success/failure scores. + +**Proposed Change:** Extend `PeerTable` with richer metrics per peer: +- Rolling average response latency +- Throughput (bytes/second) +- Reliability score (success rate over recent window) + +Actors request peers from `PeerTable` with requirements (e.g., "need a peer for storage range, prefer high-throughput"), and `PeerTable` returns the best match. + +**Subsumes:** 1.7 (adaptive timeouts, request pipelining) + +**Effort:** Medium (2-3 weeks) + +--- + +### 3.3 Pipelined Account → Storage Download + +**Current State:** Storage download starts only after ALL accounts are downloaded. In `snap_sync.rs`, `request_account_range()` must complete before `request_storage_ranges()` is called. + +**Proposed Change:** `AccountActor` sends each completed account batch to `StorageActor` immediately via actor messages. `StorageActor` starts downloading storage for those accounts while more accounts are still being fetched. + +This requires `StorageActor` to handle dynamically growing task queues (new account batches arrive while existing ones are being processed). + +**Expected Impact:** Significant — storage is one of the longest phases and can start much earlier. + +**Note:** PR #6184 partially addresses this with background storage healing (99% threshold + background task), but doesn't pipeline the download phase itself. The actor-based approach would go further. + +**Effort:** Medium (2-3 weeks, depends on 3.1) + +--- + +### 3.4 Pipelined Bytecode Download + +**Current State:** Bytecodes are downloaded in phase 5, after all storage ranges. Code hashes are known as soon as accounts are downloaded (they're part of the account state). + +**Proposed Change:** `AccountActor` sends code hashes to `BytecodeActor` immediately as accounts arrive. `BytecodeActor` downloads bytecodes in parallel with storage downloads. + +**Expected Impact:** Removes bytecode download as a sequential bottleneck — it becomes fully overlapped with storage. + +**Note:** PR #6184 already implements concurrent bytecode downloads via `mpsc` channel (without actors). If #6184 merges first, the actor migration would formalize the existing pattern. + +**Effort:** Low (1-2 weeks, depends on 3.1) + +--- + +### 3.5 Compute FKV on Insertion + +**Current State:** The flat key-value (FKV) store is populated by a background generator (`flatkeyvalue_generator` in `store.rs`) that runs after sync, iterating the trie to build denormalized lookup entries. This adds post-sync latency before the node is fully operational. + +**Proposed Change:** Compute and insert FKV entries as account/storage data arrives during snap sync, eliminating the post-sync generation step. Each actor writes FKV entries alongside trie nodes in the same batch. + +**Expected Impact:** Eliminates FKV generation as a post-sync step. Node becomes operational immediately after sync completes. + +**Effort:** Medium (2-3 weeks) + +--- + +### Phase 3 Dependency Graph + +``` +3.1 (Actor migration) + ├── 3.2 (Peer scoring) — independent, can start in parallel + ├── 3.3 (Pipelined storage) — depends on 3.1 + ├── 3.4 (Pipelined bytecodes) — depends on 3.1 + └── 3.5 (FKV on insertion) — independent of 3.1, can start earlier +``` + +--- + +## Success Metrics + +### Phase 1: Performance + +| Metric | Current | Target | Measurement Method | +|--------|---------|--------|-------------------| +| Mainnet full sync time | TBD | -50% | End-to-end benchmark | +| Account download rate | TBD | 2x | Accounts/second metric | +| Storage healing time | TBD | -60% | Phase duration metric | +| Peak memory usage | TBD | -30% | Process monitoring | +| CPU utilization during sync | TBD | >80% | Process monitoring | + +### Phase 2: Code Quality + +| Metric | Current | Target | Measurement Method | +|--------|---------|--------|-------------------| +| Test coverage | ~20% | >80% | `cargo tarpaulin` | +| Clippy warnings | 0 | 0 | CI enforcement | +| Documentation coverage | ~30% | >90% | `cargo doc` coverage | +| Cyclomatic complexity | TBD | <15 per function | `cargo clippy` | +| Functions >100 lines | TBD | 0 | Custom lint | + +### Phase 3: Pipeline Architecture + +| Metric | Current | Target | Measurement Method | +|--------|---------|--------|-------------------| +| Phase overlap | 0% (sequential) | >50% | Phase timing breakdown | +| Time from first account to first storage download | Full account phase | <30s | Per-phase logs | +| Post-sync FKV generation time | TBD | 0 (eliminated) | End-to-end benchmark | +| Total sync time improvement over Phase 1 | Baseline | -30% additional | End-to-end benchmark | + +--- + +## Risk Assessment + +| Risk | Probability | Impact | Mitigation | +|------|-------------|--------|------------| +| Performance regression | Medium | High | Comprehensive benchmarking before/after each change | +| Data corruption during sync | Low | Critical | Extensive integration testing; checksums; recovery mechanisms | +| Breaking changes to peer protocol | Low | Medium | Hive test suite validation | +| Increased complexity | Medium | Medium | Code review; documentation requirements | +| Schedule overrun | Medium | Medium | Prioritize high-impact items; iterative delivery | +| ~~Spawned refactor not ready~~ | ~~Medium~~ | ~~High~~ | ✅ Resolved — spawned 0.5.0 merged in #6295 (March 31, 2026) | +| Pipeline ordering bugs | Medium | High | Pipelining introduces data races if actors process out of order; needs careful invariant tracking | +| **Pivot-update crash masks optimization progress** | **High** | **High** | Ship §1.19 quick fix (#6475) before investing in further perf work — otherwise measurements are noisy and users hit the crash before seeing gains. Identified 2026-04-15. | +| **DB corruption requires full resync on any crash** | **High** | **High** | `process::exit(2)` paths leave inconsistent state. Addressed in §1.19 proper fix (#6474 item 8). Until fixed, every sync crash costs a full resync. | + +--- + +## Timeline + +### Recommended execution order (updated 2026-04-15) + +Two key insights drive priorities: +- **Trie building dominates insertion time** (75-91% from 1.12 profiling) +- **Small-account dispatcher overhead accounts for ~80% of idle thread-seconds** in `insert_storages` (new finding from #6476 profiling) + +Recommended priority: + +1. **Unblock users first** — ship pivot-update crash fix (#6475) so ~20% of mainnet runs stop crashing. **This is the highest-priority item right now** — every other perf win is masked by the crash. +2. **Land observability tooling** (1.18, #6470) — enables measuring all other changes +3. **Land trie building optimizations** (1.12, #6410) — largest single improvement, orthogonal to concurrency model +4. **Land write path optimizations** (1.14, 1.15, 1.16) — compound on trie improvements +5. **Explore small-account batching** (1.21, #6476) — potentially 30-40% of storage phase (largest unexploited opportunity) +6. **Land pipelining** (1.13, #6184) — concurrent bytecodes + background healing +7. **Big-account within-trie parallelization** (1.20, #6477) — ~5-6%, complementary to #6476 +8. **Phase 2 quick wins** (2.8, 2.17, 2.15, 2.1, 2.4) — low-effort correctness/quality +9. **Proper pivot-update fix** (#6474) — deeper peer selection bugs, tackle after quick fix stabilizes +10. **Actor migration** (3.1) — clean architecture, now unblocked, subsumes 2.5/2.12 +11. **Remaining Phase 2/3 items** — documentation, testing, peer scoring + +### Phase 1: Performance + +``` +Priority 0 (CRITICAL — user-facing reliability): + 1.19 Pivot update reliability quick fix (PR #6475) — stops ~20% crash rate + +Priority 1 (high impact, ready now): + 1.18 Snap sync observability tooling (PR #6470) — prereq for measuring others + 1.12 Optimize trie building (PR #6410) — -31% account insertion + 1.15 Optimize insertion/healing write paths (PR #6159) + 1.14 Eliminate SST file intermediate step (PR #6177) + 1.13 Pipeline bytecodes + background healing (PR #6184) — -13% total + 1.21 Small-account batching (Issue #6476) — potentially 30-40% of storage phase + +Priority 2 (medium impact): + 1.16 Disable WAL and improve concurrency (PR #6178) + 1.17 Fill all peer slots per tick in healing (PR #6175) + 1.20 Big-account within-trie parallelization (Issue #6477) — ~5-6% + 1.24 Adaptive request sizing + storage bisection (PR #6181) + 1.1 Parallel Header Download (PR #6059) + 1.6 Async Disk I/O (PR #6113) + 1.4 Reduce Busy-Wait Loops (Issue #6140) + +Lower priority / needs measurement: + 1.22 Decoded TrieLayerCache (PR #6348) + 1.23 Bloom filter for non-existent storage slots (PR #6288) + 1.25 Concurrent bytecode + storage (PR #6205) — may overlap with 1.13 + 1.26 Phase completion markers (PR #6189) + 1.3 Optimize Trie Node Batching + 1.7 Peer Connection Optimization (PR #6117) + 1.10 Snap sync benchmark tool (PR #6108) + +Follow-up / deferred: + 1.19b Pivot update proper fix (Issue #6474) — deeper peer selection bugs + +Done: + 1.11 Per-phase timing breakdown (✅ Merged #6136) + +Discarded: + 1.2 Parallel Account Range Requests + 1.5 Memory-Bounded Structures + 1.8 Parallel Storage Healing + 1.9 Bytes for Trie Values (PR #6057 closed) +``` + +### Phase 2: Code Quality + +``` +Quick wins (do alongside Phase 1): + 2.8 Fix Correctness Bugs (Issue #6140) + 2.17 Use existing constants for magic numbers + 2.15 Guard write_set in account path + 2.1 Extract Context Structs (Issue #6140) + 2.4 Extract Helper Functions (Issue #6140) + 2.13 Self-contained StorageTask with hashes + +Defer until after actor migration (subsumed by 3.1): + 2.5 State Machine Refactor + 2.12 Use JoinSet for snap workers + +Independent (do when bandwidth available): + 2.2 Comprehensive Documentation + 2.6 Test Coverage Improvement + 2.7 Configuration Externalization + 2.16 Healing Code Unification + 2.18 StorageTrieTracker refactor (PR #6171) + +Done: + 2.3 Consolidate Error Handling (✅ Merged #5975) + 2.9 Fix snap protocol capability bug (✅ Merged #5975) + 2.10 Add spawn_blocking to bytecodes handler (✅ Merged #5975) + 2.11 Remove DumpError.contents dead field (✅ Merged #5975) + 2.14 Move snap client methods off PeerHandler (✅ Merged #5975) +``` + +### Phase 3: Pipeline Architecture + +``` +Unblocked (spawned 0.5.0 merged in #6295): + 3.1 Migrate snap sync to Spawned actors + 3.2 Peer scoring (independent of 3.1) + 3.5 Compute FKV on insertion (independent of 3.1) + +After 3.1: + 3.3 Pipelined account → storage download + 3.4 Pipelined bytecode download (partially done by #6184) +``` + +**Note:** Spawned 0.5.0 has landed (#6295, March 31). Phase 3 is no longer blocked. However, Phase 1 performance wins should land first since they're orthogonal to the concurrency model and provide immediate measurable impact. + +**Total Duration:** ~20+ weeks (all phases overlap) + +--- + +## Dependencies + +### External Dependencies + +| Dependency | Version | Purpose | +|------------|---------|---------| +| tokio | 1.x | Async runtime | +| spawned-concurrency | 0.5.0 (✅ merged in #6295) | Actor framework (Phase 3) | +| rayon | 1.x | Parallel iterators | +| tracing | 0.1.x | Logging/metrics | + +### Internal Dependencies + +| Module | Dependency | Notes | +|--------|------------|-------| +| snap sync | ethrex-storage | Trie operations | +| snap sync | ethrex-trie | Merkle Patricia Trie | +| snap sync | peer_handler | Network layer | + +### Infrastructure Dependencies + +- Benchmarking environment with mainnet-like data +- CI pipeline for performance regression detection +- Test network (Sepolia/Holesky) access + +--- + +## Appendix A: Reference Implementation Comparison + +### geth Snap Sync +- Parallel header and state download +- Adaptive peer scoring +- In-memory trie caching + +### reth Snap Sync +- Staged sync architecture +- Parallel range downloads +- Memory-mapped storage + +### Key Takeaways +1. All major clients parallelize header and state download +2. Adaptive batching is common +3. Memory management is critical for mainnet scale + +--- + +## Appendix B: Existing TODOs/FIXMEs + +| Location | Issue | Priority | +|----------|-------|----------| +| `sync/healing/storage.rs:157` | Better data receiver design | Medium | +| `sync/healing/storage.rs:231` | Use `put_batch_no_alloc` | High | +| `sync/healing/storage.rs:299` | Store error handling (`.expect()`) | High | +| `sync/healing/storage.rs:377` | Add error handling | Medium | +| `sync/healing/state.rs:150` | Peer scoring for responses | Medium | +| `sync/healing/state.rs:195` | Optimize trie leaf reaching | Low | +| `sync/healing/state.rs:246` | Check errors for stale block detection | Medium | +| `sync/healing/state.rs:283` | Reuse buffers | Low | +| `sync/healing/state.rs:302` | Use `put_batch_no_alloc` | High | +| `sync/healing/state.rs:346` | Change tuple to struct | Low | +| `snap/client.rs:175` | Check error type and handle properly | Medium | +| `snap/client.rs:281` | Repeated code, consider refactoring | Medium | +| `snap/client.rs:565` | Stable sort for binary search | Low | +| `snap/client.rs:595` | Replace with removable structure | Medium | +| `snap/client.rs:808` | DRY — duplicated big-account logic | Medium | +| `snap/client.rs:976` | Unnecessary unzip/memory | Low | + +--- + +## Appendix C: Glossary + +| Term | Definition | +|------|------------| +| **Pivot** | The block whose state we're syncing; updated when stale | +| **Healing** | Process of fixing trie inconsistencies after multi-pivot sync | +| **Staleness** | When the pivot block is too old relative to chain head | +| **Account Range** | A contiguous range of accounts in the state trie | +| **Storage Range** | A contiguous range of storage slots for an account | +| **In-flight Request** | A request sent to a peer awaiting response | + +--- + +*Document Version: 1.3* +*Last Updated: 2026-04-15*