Skip to content

docs(l1): snapsync roadmap#6112

Open
pablodeymo wants to merge 17 commits intomainfrom
snapsync_roadmap
Open

docs(l1): snapsync roadmap#6112
pablodeymo wants to merge 17 commits intomainfrom
snapsync_roadmap

Conversation

@pablodeymo
Copy link
Copy Markdown
Contributor

@pablodeymo pablodeymo commented Feb 3, 2026

Motivation

We want to improve performance and readability in the snap sync module. This roadmap documents the strategic plan in two phases: performance optimization and code quality improvements.

Description

Adds a comprehensive roadmap document (docs/roadmaps/snap_sync_roadmap.md) covering current state analysis, bottlenecks, and improvement items across two phases.

PR Tracking

Section Description PR/Issue Status Tested
1.1 Parallel header download #6059 Open 🔄
1.2 Improve storage, state store TBD
1.3 Optimize trie node batching Not started
1.4 Reduce busy-wait loops Issue #6140 (Step 9) Open No
1.6 Async disk I/O #6113 Open No
1.7 Adaptive peer timeouts #6117 Only plan No
1.9 Bytes for trie values (O(1) clones) #6057 Open No
1.10 Snap sync benchmark tool #6108 Open
2.1 Extract context structs + AccountStorageRoots SoA + named tuple structs Issue #6140 (Steps 5, 6) Open No
2.2 Comprehensive documentation Not started
2.4 Extract helper functions Issue #6140 (Steps 3, 4) Open No
2.5 State machine refactor Not started
2.6 Test coverage improvement Not started
2.7 Configuration externalization Not started
2.8 Fix correctness bugs in request_storage_ranges (panic, expect) Issue #6140 (Steps 1, 2) Open No
2.12 Use JoinSet instead of channels for workers Not started
2.13 Self-contained StorageTask with hashes Not started
2.15 Guard write_set in account path Not started
2.16 Healing code unification Not started
2.17 Use existing constants for magic numbers Not started

Issue #6140 — Refactor request_storage_ranges (Steps)

9-step plan to refactor request_storage_ranges in crates/networking/p2p/snap/client.rs. Each step is one independently correct commit. Full details in Issue #6140.

Step Description Sections Risk
1 Replace panic! with proper error return 2.8 Very low
2 Replace .expect() with ? operator 2.8 Very low
3 Extract ensure_snapshot_dir helper (4 occurrences) 2.4 Very low
4 Extract big-account chunking helper — DRY (~70 dup lines) 2.4 Low
5 Introduce TaskTracker struct for task counting 2.1 Very low
6 Extract result processing into StorageDownloadState.process_result() 2.1 Medium
7 Track buffer size incrementally (O(1) instead of O(n)) Low
8 Remove accounts_done HashMap (inline removal) Low
9 Replace busy-poll (try_recv + sleep) with tokio::select! 1.4 Medium

Dependencies:

Steps 1, 2, 3, 5 — independent
Step 4 — independent
Step 6 — depends on Steps 4, 5
Steps 7, 8 — depend on Step 6
Step 9 — depends on Step 6

Execution order: 1 → 2 → 3 → 5 → 4 → 6 → 7 → 8 → 9

Merged

Section Description PR/Issue
Code reorganization and error handling consolidation #5975
2.3 Consolidate error handling #5975
2.9 Fix snap protocol capability bug (ETH→SNAP) #5975
2.10 Add spawn_blocking to bytecodes handler #5975
2.11 Remove dead DumpError.contents field #5975
2.14 Move snap client methods off PeerHandler #5975
1.11 Per-phase timing breakdown in Slack notifications #6136

Discarded

Section Description PR/Issue
1.2 Parallel account range requests #6101
1.5 Memory-bounded structures
1.8 Parallel storage healing

Checklist

  • Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR includes breaking changes to the Store requiring a re-sync.

@pablodeymo pablodeymo requested a review from a team as a code owner February 3, 2026 18:55
Copilot AI review requested due to automatic review settings February 3, 2026 18:55
@github-actions github-actions bot added the L1 Ethereum client label Feb 3, 2026
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Feb 3, 2026

Greptile Overview

Greptile Summary

Added a comprehensive snap sync roadmap document outlining strategic improvements for the ethrex snap sync module across two phases: performance optimization (8 improvements including parallel downloads, batching optimizations, and async I/O) and code quality enhancement (7 improvements including documentation, testing, and refactoring).

Key additions:

  • Detailed current state analysis with module structure breakdown (~4,650 LOC across 12 files)
  • Phase 1 performance improvements targeting 50% sync time reduction
  • Phase 2 code quality improvements targeting 80%+ test coverage
  • Comprehensive timeline (~16 weeks), success metrics, risk assessment, and dependencies
  • Reference to in-progress PR perf(l1): parallelize header download with state download during snap sync #6059 for parallel header download
  • Appendices with implementation comparisons, existing TODOs, and glossary

The document is well-structured with clear sections, detailed technical proposals with code examples, and practical implementation guidance. This is a documentation-only change with no code modifications.

Confidence Score: 5/5

  • This PR is completely safe to merge as it only adds documentation
  • This is a documentation-only PR that adds a comprehensive roadmap document with no code changes, no breaking changes, and no runtime impact
  • No files require special attention - this is purely documentation

Important Files Changed

Filename Overview
docs/roadmaps/snap_sync_roadmap.md Added comprehensive snap sync roadmap document with performance optimization and code quality improvement plans

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Doc as Roadmap Document
    participant Team as Development Team
    participant Code as Snap Sync Codebase
    
    Dev->>Doc: Create roadmap document
    Note over Doc: Phase 1: Performance Optimization
    Note over Doc: Phase 2: Code Quality & Maintainability
    
    Doc->>Team: Provides strategic plan
    Team->>Doc: Review improvement areas
    
    Note over Doc,Code: 8 Performance Improvements<br/>(Parallel downloads, batching, I/O)
    
    Note over Doc,Code: 7 Code Quality Improvements<br/>(Context structs, documentation, testing)
    
    Doc->>Team: Timeline: ~16 weeks total
    Doc->>Team: Success metrics defined
    Doc->>Team: Risk assessment provided
    
    Team->>Code: Implements improvements iteratively
    Code->>Team: Achieves competitive sync performance
Loading

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive roadmap document for improving the snap sync module in ethrex. The document outlines a two-phase approach: Phase 1 focuses on performance optimization to reduce sync times by 50% or more, while Phase 2 focuses on code quality and maintainability improvements.

Changes:

  • Added detailed technical roadmap for snap sync module improvements
  • Documented current state, bottlenecks, and proposed optimizations
  • Included implementation examples, timelines, and success metrics

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +550 to +551
(DownloadingHeaders, HeadersComplete) => {
self.phase = DownloadingAccounts;
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assignment self.phase = DownloadingAccounts; uses an unqualified enum variant. It should be self.phase = SnapSyncPhase::DownloadingAccounts; for proper Rust syntax.

Suggested change
(DownloadingHeaders, HeadersComplete) => {
self.phase = DownloadingAccounts;
(SnapSyncPhase::DownloadingHeaders, SnapSyncEvent::HeadersComplete) => {
self.phase = SnapSyncPhase::DownloadingAccounts;

Copilot uses AI. Check for mistakes.
Comment on lines +282 to +292
struct AdaptivePeerConfig {
base_timeout: Duration,
peer_latencies: HashMap<H256, RollingAverage>,

fn timeout_for_peer(&self, peer_id: &H256) -> Duration {
self.peer_latencies
.get(peer_id)
.map(|avg| avg.mean() * 3.0) // 3x average latency
.unwrap_or(self.base_timeout)
}
}
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct definition has a syntax error. The fn timeout_for_peer should be in an impl block, not inside the struct definition. The struct should only contain fields, and methods should be in a separate impl AdaptivePeerConfig block.

Copilot uses AI. Check for mistakes.
Comment on lines +544 to +558
pub struct SnapSyncStateMachine {
phase: SnapSyncPhase,
progress: SnapSyncProgress,

pub fn transition(&mut self, event: SnapSyncEvent) -> Result<(), SyncError> {
match (self.phase, event) {
(DownloadingHeaders, HeadersComplete) => {
self.phase = DownloadingAccounts;
Ok(())
}
// ... other transitions
_ => Err(SyncError::InvalidStateTransition),
}
}
}
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct definition has a syntax error. The pub fn transition should be in an impl block, not inside the struct definition. The struct should only contain fields (phase and progress), and methods should be in a separate impl SnapSyncStateMachine block.

Copilot uses AI. Check for mistakes.
Comment on lines +550 to +551
(DownloadingHeaders, HeadersComplete) => {
self.phase = DownloadingAccounts;
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The match pattern uses unqualified enum variants DownloadingHeaders and HeadersComplete, but they should be qualified as SnapSyncPhase::DownloadingHeaders and SnapSyncEvent::HeadersComplete respectively (assuming HeadersComplete is a variant of SnapSyncEvent).

Suggested change
(DownloadingHeaders, HeadersComplete) => {
self.phase = DownloadingAccounts;
(SnapSyncPhase::DownloadingHeaders, SnapSyncEvent::HeadersComplete) => {
self.phase = SnapSyncPhase::DownloadingAccounts;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@fedacking fedacking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments for this plan that should be addressed.

Comment thread docs/roadmaps/snap_sync_roadmap.md Outdated
### 1.5 Memory-Bounded Structures

**Current State:**
- `accounts_by_root_hash` in `request_storage_ranges()` is unbounded
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure doesn't grow to gigabytes on mainnet due to the limited account with storage.

Comment thread docs/roadmaps/snap_sync_roadmap.md Outdated
| Bottleneck | Location | Impact | Priority |
|------------|----------|--------|----------|
| Sequential header download | `sync_cycle_snap()` | Blocks state download start | Critical |
| Single-threaded account range processing | `request_account_range()` | Underutilizes peers | High |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requests to peers is done in parallel, this is wrong.

Comment thread docs/roadmaps/snap_sync_roadmap.md Outdated
|------------|----------|--------|----------|
| Sequential header download | `sync_cycle_snap()` | Blocks state download start | Critical |
| Single-threaded account range processing | `request_account_range()` | Underutilizes peers | High |
| Inefficient trie node batching | `heal_state_trie()`, `heal_storage_trie()` | Excessive DB writes | High |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what it constitutes as excessive db writes. Writes are batched.

Comment thread docs/roadmaps/snap_sync_roadmap.md Outdated
| Sequential header download | `sync_cycle_snap()` | Blocks state download start | Critical |
| Single-threaded account range processing | `request_account_range()` | Underutilizes peers | High |
| Inefficient trie node batching | `heal_state_trie()`, `heal_storage_trie()` | Excessive DB writes | High |
| Busy-wait loops | Multiple locations | CPU waste | Medium |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are done in scenarios where there aren't peers, but they should be reviewd.

Comment thread docs/roadmaps/snap_sync_roadmap.md Outdated
| Single-threaded account range processing | `request_account_range()` | Underutilizes peers | High |
| Inefficient trie node batching | `heal_state_trie()`, `heal_storage_trie()` | Excessive DB writes | High |
| Busy-wait loops | Multiple locations | CPU waste | Medium |
| Unbounded memory structures | `accounts_by_root_hash` | Memory pressure | Medium |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue, as measured in mainnet.

Comment thread docs/roadmaps/snap_sync_roadmap.md Outdated
}
```

#### 1.3.2 Dynamic Batch Sizing
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see how it would improve healing that much, should measure.


```rust
// Current (snap_sync.rs ~line 452)
loop {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would review that these channels aren't doing anything else, but on principle it sounds like a good change.

Comment thread docs/roadmaps/snap_sync_roadmap.md Outdated

// Proposed
tokio::fs::create_dir_all(dir).await?;
tokio::task::spawn_blocking(move || {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're already inside a spawnblocking, but they should use tokio::fs.

```

#### 1.7.2 Request Pipelining
Increase in-flight requests for high-quality peers:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change, but could be excessive complexity.

Comment thread docs/roadmaps/snap_sync_roadmap.md Outdated

---

### 1.8 Parallel Storage Healing
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, storage healing is already parallelized.

@github-project-automation github-project-automation bot moved this to In Progress in ethrex_l1 Feb 5, 2026
@fedacking
Copy link
Copy Markdown
Contributor

PR #5975 Review Comments (fedacking) — Validated & Prioritized

26 comments validated against refactor/snapsync-healing-unification. 1 already fixed, 1 informational acknowledgment, 24 actionable.


HIGH — Bugs / Correctness

1. BUG: Wrong peer capabilities in all snap client requests
snap/client.rs:229,423,958 — All three get_best_peer() calls use SUPPORTED_ETH_CAPABILITIES instead of SUPPORTED_SNAP_CAPABILITIES. Selects peers that may not support snap protocol. Healing modules use the correct one.

2. BUG: Missing spawn_blocking in process_byte_codes_request
snap/server.rs:107 — Blocking I/O via store.get_account_code() without spawn_blocking. The other three handlers all use it. Can block the tokio runtime.

3. Fragile index-based StorageTask
snap/client.rs:77StorageTask references accounts_by_root_hash by index. Any mutation silently corrupts in-flight tasks. Should send actual hashes.

4. AccountStorageRoots needs simplification
sync.rs:167BTreeMap<H256, (Option<H256>, Vec<(H256, H256)>)> is the hottest struct during storage downloads. Recommend SoA approach and a named struct instead of the tuple. Option<H256> = storage root (None if healed), Vec<(H256, H256)> = remaining download intervals.

5. Complex state machine in storage result handling
snap/client.rs:678 — The remaining_start/remaining_end/remaining_hash_range logic has many branches. Should use an enum for explicit state transitions (see docs/l1/fundamentals/snap_sync/Flow-BigAccountLogic.png).


MEDIUM — Code Quality / Reliability / Memory

6. Snap client methods shouldn't be extension methods on PeerHandler
snap/client.rs:4 — These are complex orchestration functions (task queues, workers, file I/O), not peer operations. Should be standalone functions taking &mut PeerHandler.

7. DumpError keeps up to 64MB in memory for nothing
snap/error.rs:136contents: Vec<u8> field held the snapshot chunk for a retry mechanism that no longer exists. Remove it, then replace the custom Debug impl (line 140) with #[derive(Debug)].

8. Use JoinSet instead of channels for workers
snap/client.rs:141,589 — Both account range and storage range workers use mpsc::channel. If a worker panics, the message is lost silently. JoinSet propagates panics and handles lifecycle.

9. write_set should guard against queuing multiple writes
snap/client.rs:176 — Spawns disk-write tasks without checking if one is pending. The storage path (line 629) already does !disk_joinset.is_empty() check — account path should match.

10. Disk write logic duplicated 3x
snap/client.rs:156,287,609 — Same pattern (check threshold, ensure dir, dump). Extract a helper like flush_snapshot_to_disk(). Line 287 already has a TODO acknowledging this.

11. Complex tuple types should be named structs

  • healing/state.rs:106 — Channel type (H256, Result<Vec<Node>, SnapError>, Vec<RequestMetadata>) should be a struct.
  • snap/client.rs:142 — Worker returns (Vec<AccountRangeUnit>, H256, Option<(H256, H256)>). The bytecodes path already uses a TaskResult struct as a good example.
  • snap/client.rs:71StorageTaskResult.remaining_* fields need doc comments.

12. State and storage healing should share more code
healing/mod.rs:1state.rs (~420 lines) and storage.rs (~530 lines) implement the same algorithm. Differences: path representation (single vs double nibbles) and leaf type (accounts vs U256). Could be generic.

13. Add comment explaining accounts_by_root_hash
snap/client.rs:544 — Key optimization: accounts are grouped by storage root so each root is downloaded only once. Not documented in code.


LOW — Style / Minor

14. Use existing constants for magic numbers

  • client.rs:569300 should be STORAGE_BATCH_SIZE (already in constants.rs).
  • client.rs:827,862H256::repeat_byte(0xff) should be HASH_MAX (already in constants.rs).
  • client.rs:111800 should be a named constant.

15. request_account_range over-generalized
snap/client.rs:99 — Only ever called with (H256::zero(), H256::repeat_byte(0xff)). Not harmful, but the parameterization is unused.

16. Rename "missing children" to "pending children"
healing/state.rs:363 — "Pending" better conveys they'll be resolved through healing.

17. Create issue to test constant values under load
snap/constants.rs — 15+ tuning parameters. Tracking request, not a code change.

18. Already fixed: missing_children_count u64 vs usize (commit f680777).

19. Acknowledged: Error handling at client.rs:181 — log and stop is fine for now.


Recommended Order

  1. Fix capabilities bug (docs: add milestones #1) — 3 lines
  2. Add spawn_blocking (chore: create project structure #2) — small
  3. Remove DumpError.contents + derive Debug (Add Dockerfile #7) — small, frees memory
  4. Replace magic numbers with existing constants (Populate ChainSpec based on information stored in genesis.json file #14) — trivial
  5. Extract disk-write helper (feat: add simple RLP encode implementation #10) — dedup
  6. Named structs for tuple types (feat: add fmt, test and clippy to CI #11)
  7. JoinSet migration (build: add Dockerfile. #8) — reliability
  8. Guard write_set (feat: add support for separate engine and rpc apis #9)
  9. Document remaining_* fields and accounts_by_root_hash (feat: add fmt, test and clippy to CI #11, Add jwt authentication to engine api #13)
  10. Larger refactors: SoA for AccountStorageRoots (docs: update milestones. #4), self-contained tasks (build: add Github actions boilerplate #3), healing unification (feat: implement rlp decoding for common types #12), state machine enum (feat: add basic RPC api. #5)

Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The roadmap is well-structured and comprehensive — the two-phase approach, dependency graph for the #6140 steps, and discarded-section transparency are all great.

The main issue is that the module structure table, file paths, and line numbers throughout the document are stale — they reference the pre-#5975 layout (snap/client.rs, snap/server.rs, snap/constants.rs, sync/healing/state.rs, etc.) which no longer exists after the refactor merged on Feb 6. Since this document will be the reference for ongoing work, it should reflect the current codebase. See inline comment on the module structure table for the updated layout.

snap_sync_roadmap.md:655 — The path crates/networking/p2p/snap/client.rs no longer exists after #5975. request_storage_ranges is now in sync.rs. The line numbers referenced throughout the #6140 steps section (and in sections 2.12, 2.13, 2.15, 2.17, and Appendix B) also need updating to match the current layout. Since this document is meant to guide the actual refactoring work, stale paths will cause confusion — worth a pass to update all file references.

snap_sync_roadmap.md:578 — nit: Section 2.9 says "All get_best_peer() calls in snap/client.rs" — but after #5975 these are in sync.rs. Same issue in sections 2.10 (snap/server.rs), 2.11 (snap/error.rs), and 2.14. Since these are marked as done/merged, the stale paths are less critical but still misleading if someone uses this doc as a reference.

Minor: the ASCII pipeline diagram (line 57) has inconsistent box widths (trailing spaces don't align), though this is cosmetic.


### Module Structure

| File | Lines | Purpose |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire table references the pre-#5975 file layout. After the refactor merged, the snap sync module is organized as:

File Lines Purpose
sync.rs 1,658 Main sync orchestration + account/storage range requests
snap.rs 1,008 Snap protocol (client + server + errors + constants)
sync/storage_healing.rs 718 Storage trie healing
sync/state_healing.rs 471 State trie healing
sync/code_collector.rs 102 Bytecode collection
Total ~3,957

The snap/client.rs, snap/server.rs, snap/error.rs, snap/constants.rs split no longer exists — everything is in a single snap.rs. Same for sync/healing/sync/state_healing.rs and sync/storage_healing.rs.

|------|-------------|----------|------|
| 1 | Replace `panic!` with proper error return | 2.8 | Very low |
| 2 | Replace `.expect()` with `?` operator | 2.8 | Very low |
| 3 | Extract `ensure_snapshot_dir` helper (4 occurrences) | 2.4 | Very low |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path (crates/networking/p2p/snap/client.rs) no longer exists after #5975. request_storage_ranges is now in sync.rs. The line numbers referenced throughout the #6140 steps section (and in sections 2.12, 2.13, 2.15, 2.17, and Appendix B) also need updating to match the current layout.

Since this document is meant to guide the actual refactoring work, stale paths will cause confusion — worth a pass to update all file references.

Comment thread docs/roadmaps/snap_sync_roadmap.md Outdated

### 2.10 Add `spawn_blocking` to Bytecodes Handler — ✅ DONE

**Status:** Merged in #5975. The function in `snap/server.rs:108-131` is `async fn` with `spawn_blocking`, matching the pattern of all other handlers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This says "All get_best_peer() calls in snap/client.rs" — but after #5975 these are in sync.rs. Same issue in sections 2.10 (snap/server.rs), 2.11 (snap/error.rs), and 2.14. Since these are marked as done/merged, the stale paths are less critical but still misleading if someone uses this doc as a reference.

Mark sections 1.2, 1.5, and 1.8 as discarded. Mark 2.3 as merged (PR #5975).
Link sections 1.4, 2.1, 2.4 to Issue #6140 (request_storage_ranges refactor).
Add new section 2.8 for correctness bug fixes. Update timeline accordingly.
Not an issue on mainnet as confirmed by fedacking — accounts_by_root_hash
doesn't grow to gigabytes due to limited accounts with storage.
- Remove incorrect "Single-threaded account range processing" bottleneck
  (requests are already parallel between chunks)
- Fix "Inefficient trie node batching" description (writes are already
  batched, note that impact needs measurement)
- Clarify busy-wait loops only happen when no peers are available
- Fix 1.6 Async Disk I/O description (already inside spawn_blocking,
  change is just for directory operations with tokio::fs)
- Add complexity warning to 1.7 Adaptive Peer Timeouts
Adds a table with all 9 refactoring steps for request_storage_ranges,
mapping each step to the roadmap section it addresses, with dependency
graph and execution order.
Add 9 new sections (2.9-2.17) from fedacking's PR #5975 review:
- 2.9: Fix snap protocol capability bug (ETH vs SNAP)
- 2.10: Add spawn_blocking to bytecodes handler
- 2.11: Remove dead DumpError.contents field
- 2.12: Use JoinSet instead of channels for workers
- 2.13: Self-contained StorageTask with hashes instead of indices
- 2.14: Move snap client methods off PeerHandler
- 2.15: Guard write_set in account path
- 2.16: Healing code unification (generic trie healing)
- 2.17: Use existing constants for magic numbers

Also extend 2.1 description to include AccountStorageRoots SoA
simplification and named structs for tuple types.
Update timeline with new items.
…oadmap.

Already fixed on the refactor/snapsync-healing-unification branch.
…oadmap.

Already fixed on the refactor/snapsync-healing-unification branch.
Already fixed on the refactor/snapsync-healing-unification branch, PR #6154 for main.
…in roadmap.

Already fixed on the refactor/snapsync-healing-unification branch.
…5975

instead of the now-merged refactor/snapsync-healing-unification branch.
@ElFantasma ElFantasma dismissed their stale review March 12, 2026 20:42

Dismissing — my comments about stale file paths were incorrect. The paths (snap/client.rs, sync/healing/state.rs, etc.) are correct and still exist on main. #5975 did not reorganize the module structure as I claimed. Sorry for the noise.


### 3.1 Migrate Snap Sync to Spawned Actors

**Current State:** `snap_sync.rs` orchestrates everything via sequential function calls. Workers are spawned with `tokio::spawn` and communicate via `mpsc::channel`. Peer acquisition uses `try_recv` + sleep busy-wait loops.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of task that are joinsets that communicate via the return values of the task.

- Add §1.18 observability tooling (PR #6470)
- Add §1.19 pivot update reliability (PR #6475, issue #6474)
- Add §1.20 big-account within-trie parallelization (issue #6477)
- Add §1.21 small-account batching (issue #6476)
- Add §1.22 decoded TrieLayerCache (PR #6348)
- Add §1.23 bloom filter for non-existent storage (PR #6288)
- Add §1.24 adaptive request sizing + bisection (PR #6181)
- Add §1.25 concurrent bytecode + storage (PR #6205)
- Add §1.26 phase completion markers (PR #6189)
- Add §2.18 StorageTrieTracker refactor (PR #6171)
- Update current-state bottleneck table with small-account and pivot-update findings
- Reprioritize timeline: pivot-update crash fix is now priority 0
- Add two risks (pivot crash masks perf work, DB corruption on every crash)
- Bump doc version to 1.3
@ElFantasma
Copy link
Copy Markdown
Contributor

2026-04-15 Update

Comprehensive refresh covering work since the last update (2026-04-06). All new items are timestamped inline.

New roadmap items (Phase 1)

§ Topic PR/Issue Status
1.18 Snap sync observability (RPC + TUI + dashboards + monitor) #6470 OPEN
1.19 Pivot update reliability — crashes ~20% of mainnet runs #6475 (quick fix), #6474 (proper fix) PR OPEN / issue OPEN
1.20 Big-account within-trie parallelization (snap sync side, distinct from #5482) #6477 Issue OPEN
1.21 Small-account batching in insert_storages (~80% of idle thread-seconds) #6476 Issue OPEN
1.22 Decoded TrieLayerCache #6348 PR OPEN
1.23 Bloom filter for non-existent storage slots #6288 PR OPEN
1.24 Adaptive request sizing + storage bisection #6181 PR OPEN
1.25 Concurrent bytecode + storage download #6205 PR OPEN (may overlap #6184)
1.26 Phase completion markers for validation #6189 PR OPEN

New roadmap items (Phase 2)

§ Topic PR/Issue Status
2.18 StorageTrieTracker storage download refactor #6171 OPEN

Re-prioritized timeline

The pivot-update crash fix (§1.19) is now Priority 0 — every other perf win is masked by the ~20% crash rate and DB corruption on failure. Second-highest priority change: small-account batching (§1.21) identified as the largest remaining perf opportunity after #6410 (potentially 30-40% of storage phase vs ~5-6% for the big-account optimization).

Updated bottleneck table

Added three rows to "Current Performance Bottlenecks":

  • Pivot update crashes (Critical reliability)
  • Small-account dispatcher overhead (~80% of idle thread-seconds, newly quantified)
  • Large storage tries running single-threaded (~20% of idle thread-seconds)

New risks

PR/issue status changes since last update

All previously-tracked items remain OPEN in the same state — no merges or closes since 2026-04-06.

Not-yet-added items worth considering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client snapsync

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants