perf(trie): share cached nodes across read-only walks (CloneForReadOnly 19.2 GB -> 1.4 MB)#11539
perf(trie): share cached nodes across read-only walks (CloneForReadOnly 19.2 GB -> 1.4 MB)#11539
Conversation
## Changes - Add `ITrieNodeResolverSource` capability for sharing immutable cached trie nodes on synchronous read-only traversal, plus `IScopedReadOnlyTraversalProvider` for full stores to opt in via the standard `ScopedTrieStore`. - Tighten `TrieNode`: seqlock-protected RLP read/write (atomic `_rlpArray` + `_rlpSeqAndLength`), inline-child slice packing, slot publication via `Volatile.Read`/CAS pairing for x64/ARM64 ordering. - `TrieStore`/`ReadOnlyTrieStore`/`OverlayTrieStore`/`PreCachedTrieStore`/`WitnessCapturingTrieStore` opt in through one shared `ReadOnlyTraversalResolverBase`; only `FindCachedOrUnknown` and per-address rebuild differ per store. - `PatriciaTree` resolves through the shared traversal resolver on synchronous reads; public `IReadOnlyTrieStore` lookup stays clone-preserving. - Per-instance `TrieStore` counters (`CloneForReadOnlyCount`, `FallbackNotShareableCount`, `SharedNodeHitCount`) replace global debug metrics and provide test-only observability without ceremony. - Add regression coverage for shared traversal, wrapper forwarding, `CappedArray` slices, and read-flag preservation through the read-only path. ## Types of changes - [ ] Bugfix (a non-breaking change that fixes an issue) - [x] New feature (a non-breaking change that adds functionality) - [ ] Breaking change (a change that causes existing functionality not to work as expected) - [x] Optimization - [ ] Refactoring - [ ] Documentation update - [ ] Build-related changes - [ ] Other: _Description_ ## Testing #### Requires testing - [x] Yes #### If yes, did you write tests? - [x] Yes #### Notes on testing - dotnet build src/Nethermind/Nethermind.Trie/Nethermind.Trie.csproj -c Debug --no-restore -m:1 - dotnet build src/Nethermind/Nethermind.Trie.Test/Nethermind.Trie.Test.csproj -c Debug --no-restore -p:BuildProjectReferences=false -m:1 - dotnet build src/Nethermind/Nethermind.Consensus/Nethermind.Consensus.csproj -c Debug --no-restore -p:BuildProjectReferences=false -m:1 - dotnet build src/Nethermind/Nethermind.Synchronization/Nethermind.Synchronization.csproj -c Debug --no-restore -m:1 - dotnet build src/Nethermind/Nethermind.Trie/Nethermind.Trie.csproj -c Release --no-restore -m:1 - dotnet build src/Nethermind/Nethermind.Trie.Test/Nethermind.Trie.Test.csproj -c Release --no-restore -p:BuildProjectReferences=false -m:1 - Full Nethermind.Trie.Test suite (446 passed, 4 pre-existing skips) in Debug. - Focused shared-traversal and wrapper-forwarding tests in Debug and Release. ## Documentation #### Requires documentation update - [ ] Yes - [x] No #### Requires explanation in Release Notes - [ ] Yes - [x] No ## Remarks - Public read-only trie lookup remains clone-safe; shared traversal is opt-in through PatriciaTree's synchronous resolver path. - Test observability uses per-`TrieStore`-instance counters read off the specific store under test - no global mutable state, no reset ceremony, no `#if DEBUG` machinery in the production binary.
|
Claude finished @benaadams's task in 12m 41s —— View job Review
SummaryThis is a well-engineered, high-impact optimization. The measured allocation reduction (19.2 GB → 1.4 MB) is dramatic and the design is sound. The seqlock, CAS-based child publication, and opt-in shared traversal path are all correctly implemented. No critical or high findings. Critical: 0 | High: 0 | Medium: 1 | Low: 3 MediumM1 — Seqlock bit-63/sequence-counter collision is undocumented ( The slice flag ( LowL1 — The new L2 — The L3 —
Correctness & Safety Observations
PerformanceThe allocation delta is striking and the design is cost-appropriate: one |
There was a problem hiding this comment.
Pull request overview
This PR introduces an opt-in shared resolver path for synchronous read-only trie traversal so cached, sealed TrieNode instances can be reused across walks (eliminating most CloneForReadOnly allocations), while preserving clone-safe behavior for the public IReadOnlyTrieStore lookup APIs.
Changes:
- Add
ITrieNodeResolverSource+IScopedReadOnlyTraversalProviderand wire store wrappers to provide a shared read-only traversal resolver where safe. - Update
TrieNodeto support seqlock-protected RLP state and inline-child RLP slicing (backed by a shared parent array) with safer slot publication for children. - Switch
PatriciaTree’s synchronous read path to prefer the shared traversal resolver; add regression tests for sharing, wrapper forwarding, and slice behavior.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Nethermind/Nethermind.Trie/TrieNodeResolverWithReadFlags.cs | Implements ITrieNodeResolverSource so read-only traversal keeps ReadFlags behavior. |
| src/Nethermind/Nethermind.Trie/TrieNode.cs | Adds RLP seqlock + inline-child RLP slicing and publish/Volatile patterns for child slots. |
| src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs | Adds shared cached-node lookup path, per-instance counters, and updates HasRoot logic. |
| src/Nethermind/Nethermind.Trie/Pruning/TrieNodeResolverExtensions.cs | Adds AsReadOnlyTraversal() helper for resolver selection. |
| src/Nethermind/Nethermind.Trie/Pruning/ScopedTrieStore.cs | Makes scoped store expose ITrieNodeResolverSource via full-store provider capability. |
| src/Nethermind/Nethermind.Trie/Pruning/ReadOnlyTrieStore.cs | Implements IScopedReadOnlyTraversalProvider to return a shared traversal resolver for TrieStore. |
| src/Nethermind/Nethermind.Trie/Pruning/ReadOnlyTraversalResolverBase.cs | Common forwarding base class for shared traversal resolvers across store wrappers. |
| src/Nethermind/Nethermind.Trie/Pruning/OverlayTrieStore.cs | Implements shared traversal provider that delegates cached-node lookup to base store’s traversal resolver. |
| src/Nethermind/Nethermind.Trie/Pruning/ITrieNodeResolverSource.cs | New capability interface for producing a shared traversal resolver. |
| src/Nethermind/Nethermind.Trie/Pruning/IScopedReadOnlyTraversalProvider.cs | New store capability interface for address-scoped shared traversal resolvers. |
| src/Nethermind/Nethermind.Trie/PreCachedTrieStore.cs | Adds traversal resolver forwarding so pre-block caching behavior is preserved on shared reads. |
| src/Nethermind/Nethermind.Trie/PatriciaTree.cs | Uses AsReadOnlyTraversal() resolver on synchronous read paths to avoid clone-heavy lookups. |
| src/Nethermind/Nethermind.Trie/CachedTrieStore.cs | Exposes a traversal resolver that preserves its caching behavior for shared read-only traversal. |
| src/Nethermind/Nethermind.Trie.Test/TrieNodeTests.cs | Adds test ensuring inline-child RLP slices share the parent backing array. |
| src/Nethermind/Nethermind.Trie.Test/TrieNodeResolverWithReadFlagsTests.cs | Adds test ensuring traversal resolver path preserves ReadFlags. |
| src/Nethermind/Nethermind.Trie.Test/Pruning/TreeStoreTests.cs | Adds tests asserting shared traversal avoids cloning across multiple store wrappers and commit-buffer mode. |
| src/Nethermind/Nethermind.Trie.Test/OverlayTrieStoreTests.cs | Adds coverage for shared traversal behavior through OverlayTrieStore. |
| src/Nethermind/Nethermind.Synchronization/SnapSync/SnapUpperBoundAdapter.cs | Forwards ITrieNodeResolverSource so snap upper-bound wrapper doesn’t break traversal sharing. |
| src/Nethermind/Nethermind.Core/Buffers/CappedArray.cs | Adds offset support to represent slices without copying. |
| src/Nethermind/Nethermind.Consensus/Stateless/WitnessCapturingTrieStore.cs | Adds traversal resolver forwarding while keeping witness capture on the read path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@claude re-review |
|
Claude finished @benaadams's task in 8m 46s —— View job Claude finished @benaadams's task — Re-review
SummaryAll four findings from the prior review have been addressed:
Critical: 0 | High: 0 | Medium: 0 | Low: 1 LowNew L1 — The old guard ( Notes on Copilot inline commentsCopilot comment — "remove resolver parameter from Copilot comment — " Additional correctness observations
|
|
@claude re-review |
|
Claude finished @benaadams's task in 5m 56s —— View job Re-review (3rd pass)
SummaryThe one remaining finding from the 2nd review has been resolved:
Critical: 0 | High: 0 | Medium: 0 | Low: 0 Additional observations (3rd pass)
All correctness/safety observations from prior passes remain valid — seqlock memory model (x64/ARM64), shared-node mutation via CAS, pool safety for slice nodes, post-flush VerdictAll prior findings are resolved. The PR is clean and ready to merge. |
|
Claude finished @benaadams's task in 5m 37s —— View job Re-review (3rd pass)
SummaryAll findings from the prior two reviews have been resolved.
Critical: 0 | High: 0 | Medium: 0 | Low: 0 Additional correctness observations (3rd pass)
VerdictMergeable. This is a correct, well-tested, high-impact optimization. No open findings. |
EXPB Benchmark ComparisonRun: View workflow run superblocksScenario: Client Processing (SSE)
K6 TTFB
realblocksScenario: Client Processing (SSE)
K6 TTFB
|
LukaszRozmej
left a comment
There was a problem hiding this comment.
Don't merge without @asdacap review
|
@claude re-review |
|
Claude finished @benaadams's task in 5m 30s —— View job Re-review (4th pass)
SummaryAll five items raised by @LukaszRozmej have been addressed. There are no new critical, high, or medium findings.
Critical: 0 | High: 0 | Medium: 0 | Low: 0 New code correctness (4th pass)
Shared warmup session (
Benchmark results — superblocks: P95/P99 −4.6%/−7.4% (clear improvement); median +0.5% (noise). Realblocks: P99 +6.9%, median +2.5% (minor, within single-run variance). The superblocks workload (large block payloads with deep trie traversal) is the primary beneficiary of this PR's allocation reduction; the small realblocks overhead is expected variance. No regression. ✅ VerdictMergeable. All prior findings resolved. All LukaszRozmej feedback addressed. The PR is correct, well-tested, and clean. |
Changes
ITrieNodeResolverSourcecapability for sharing immutable cached trie nodes on synchronous read-only traversal, plusIScopedReadOnlyTraversalProviderfor full stores to opt in via the standardScopedTrieStore.TrieNode: seqlock-protected RLP read/write (atomic_rlpArray+_rlpSeqAndLength), inline-child slice packing, slot publication viaVolatile.Read/CAS pairing for x64/ARM64 ordering.TrieStore/ReadOnlyTrieStore/OverlayTrieStore/PreCachedTrieStore/WitnessCapturingTrieStoreopt in through one sharedReadOnlyTraversalResolverBase; onlyFindCachedOrUnknownand per-address rebuild differ per store.PatriciaTreeresolves through the shared traversal resolver on synchronous reads; publicIReadOnlyTrieStorelookup stays clone-preserving.TrieStorecounters (CloneForReadOnlyCount,FallbackNotShareableCount,SharedNodeHitCount) replace global debug metrics and provide test-only observability without ceremony.CappedArrayslices, and read-flag preservation through the read-only path.Impact
Per-block
CloneForReadOnlyallocations on the read-only trie path drop from ~19.2 GB to ~1.4 MB (≈14,000×, 99.99% reduction). The dominantPatriciaTree.Get → CloneForReadOnlybranch (14.9 GB / 77.5% of the before-tree) is eliminated - reads now share the cached, sealedTrieNodedirectly via the read-only traversal resolver. The residual ~1.4 MB is the unavoidable fallback for non-shareable cases (cached-RLP-only and unsealed dirty-cache hits).Before Allocations 19.2 GB
After Allocations 1.4 MB
Types of changes
Testing
Requires testing
If yes, did you write tests?
Documentation
Requires documentation update
Remarks
TrieStore-instance counters read off the specific store under test - no global mutable state, no reset ceremony, no#if DEBUGmachinery in the production binary.