Skip to content

perf(p2p): add flat connection pool decoupled from Kademlia routing table#6504

Closed
azteca1998 wants to merge 2 commits intofix/kademlia-snapsync-peer-pruningfrom
perf/kademlia-connection-pool
Closed

perf(p2p): add flat connection pool decoupled from Kademlia routing table#6504
azteca1998 wants to merge 2 commits intofix/kademlia-snapsync-peer-pruningfrom
perf/kademlia-connection-pool

Conversation

@azteca1998
Copy link
Copy Markdown
Contributor

Summary

  • Add a separate flat connection pool (50K capacity) for RLPx connection initiation, decoupled from the k-bucket routing table
  • Randomize contact selection from the pool to avoid bucket traversal bias

Builds on #6497 (peer pruning fix). Alternative/complementary to #6503 (randomization-only approach).

Problem

The Kademlia k-bucket routing table (#6458) limits stored contacts to 256 × 16 = 4,096 by design. The old flat IndexMap held up to 100K. This 25x reduction in candidate pool size caused snap sync regressions of 39-75% across all networks because:

  1. Fewer candidates for connection initiation → slower ramp to TARGET_PEERS (100)
  2. Faster exhaustion of candidates → more frequent retries of failed contacts
  3. XOR distance distribution means ~87% of contacts cluster in buckets 253-255, but each bucket only holds 16

Changing k-bucket sizes would break Kademlia protocol semantics, so the routing table structure can't be modified.

Approach

Decouple "routing table" from "connection candidate pool":

  • K-buckets (unchanged): used for all Kademlia protocol operations (get_closest_nodes, get_nodes_at_distances, get_contact_for_lookup, etc.)
  • Connection pool (new): flat IndexMap<H256, Node> capped at 50K, used exclusively by get_contact_to_initiate for RLPx connection initiation

All discovered contacts are inserted into both structures. The connection pool is cleaned during prune() and uses k-bucket state for filtering when available (unwanted, fork ID validity). Contacts only in the pool (not in k-buckets) are assumed eligible — the RLPx handshake rejects incompatible peers.

Changes

  • Add connection_pool: IndexMap<H256, Node> field to PeerTableServer
  • Insert into pool on every discovery path (new_contacts, new_contact_records, insert_if_new)
  • Rewrite do_get_contact_to_initiate to draw from pool with random selection
  • Clean pool entries during prune() when contacts are discarded

Test plan

…able

Add a separate IndexMap<H256, Node> connection pool (capacity 50K) for
RLPx connection initiation, decoupled from the k-bucket routing table
(which is limited to 256 × 16 = 4,096 contacts by Kademlia design).

All discovered contacts are inserted into both the k-buckets (for
Kademlia protocol operations like FindNode/GetClosestNodes) and the
connection pool (for peer connection initiation). This restores the
large candidate pool that existed before the k-bucket migration while
preserving correct Kademlia routing semantics.

The connection pool is:
- Populated on every contact discovery (discv4, discv5, insert_if_new)
- Cleaned during prune() when contacts are marked disposable
- Capped at 50K entries with oldest-first eviction
- Used with random selection and k-bucket state filtering
@github-actions github-actions Bot added the performance Block execution throughput and performance in general label Apr 20, 2026
@github-actions
Copy link
Copy Markdown

Lines of code report

Total lines added: 53
Total lines removed: 0
Total lines changed: 53

Detailed view
+------------------------------------------------+-------+------+
| File                                           | Lines | Diff |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/peer_handler.rs   | 555   | +4   |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/peer_table.rs     | 1277  | +48  |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/snap_sync.rs | 1020  | +1   |
+------------------------------------------------+-------+------+

Matches the candidate pool size used by Reth and Nethermind.
@azteca1998 azteca1998 changed the base branch from main to fix/kademlia-snapsync-peer-pruning April 20, 2026 12:03
@github-actions
Copy link
Copy Markdown

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 62.433 ± 0.163 62.234 62.719 1.00
head 62.482 ± 0.189 62.260 62.948 1.00 ± 0.00

azteca1998 added a commit that referenced this pull request Apr 20, 2026
Merge PR #6503 (randomized contact selection) into PR #6504 (flat
connection pool). The connection pool approach already includes
randomization, so we keep its version of do_get_contact_to_initiate.
@azteca1998
Copy link
Copy Markdown
Contributor Author

Superseded by #6511 (Kademlia v2) which includes the connection pool + all performance fixes.

@azteca1998 azteca1998 closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Block execution throughput and performance in general

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant