perf(trie): share cached nodes across read-only walks (CloneForReadOnly 19.2 GB -> 1.4 MB) by benaadams · Pull Request #11539 · NethermindEth/nethermind

benaadams · 2026-05-09T14:12:30Z

Changes

Add ITrieNodeResolverSource capability for sharing immutable cached trie nodes on synchronous read-only traversal, plus IScopedReadOnlyTraversalProvider for full stores to opt in via the standard ScopedTrieStore.
Tighten TrieNode: seqlock-protected RLP read/write (atomic _rlpArray + _rlpSeqAndLength), inline-child slice packing, slot publication via Volatile.Read/CAS pairing for x64/ARM64 ordering.
TrieStore/ReadOnlyTrieStore/OverlayTrieStore/PreCachedTrieStore/WitnessCapturingTrieStore opt in through one shared ReadOnlyTraversalResolverBase; only FindCachedOrUnknown and per-address rebuild differ per store.
PatriciaTree resolves through the shared traversal resolver on synchronous reads; public IReadOnlyTrieStore lookup stays clone-preserving.
Per-instance TrieStore counters (CloneForReadOnlyCount, FallbackNotShareableCount, SharedNodeHitCount) replace global debug metrics and provide test-only observability without ceremony.
Add regression coverage for shared traversal, wrapper forwarding, CappedArray slices, and read-flag preservation through the read-only path.

Impact

Per-block CloneForReadOnly allocations on the read-only trie path drop from ~19.2 GB to ~1.4 MB (≈14,000×, 99.99% reduction). The dominant PatriciaTree.Get → CloneForReadOnly branch (14.9 GB / 77.5% of the before-tree) is eliminated - reads now share the cached, sealed TrieNode directly via the read-only traversal resolver. The residual ~1.4 MB is the unavoidable fallback for non-shareable cases (cached-RLP-only and unsealed dirty-cache hits).

Before Allocations 19.2 GB

After Allocations 1.4 MB

Types of changes

New feature (a non-breaking change that adds functionality)
Optimization

Testing

Requires testing

Yes

If yes, did you write tests?

Yes

Documentation

Requires documentation update

No

Remarks

Public read-only trie lookup remains clone-safe; shared traversal is opt-in through PatriciaTree's synchronous resolver path.
Test observability uses per-TrieStore-instance counters read off the specific store under test - no global mutable state, no reset ceremony, no #if DEBUG machinery in the production binary.

## Changes - Add `ITrieNodeResolverSource` capability for sharing immutable cached trie nodes on synchronous read-only traversal, plus `IScopedReadOnlyTraversalProvider` for full stores to opt in via the standard `ScopedTrieStore`. - Tighten `TrieNode`: seqlock-protected RLP read/write (atomic `_rlpArray` + `_rlpSeqAndLength`), inline-child slice packing, slot publication via `Volatile.Read`/CAS pairing for x64/ARM64 ordering. - `TrieStore`/`ReadOnlyTrieStore`/`OverlayTrieStore`/`PreCachedTrieStore`/`WitnessCapturingTrieStore` opt in through one shared `ReadOnlyTraversalResolverBase`; only `FindCachedOrUnknown` and per-address rebuild differ per store. - `PatriciaTree` resolves through the shared traversal resolver on synchronous reads; public `IReadOnlyTrieStore` lookup stays clone-preserving. - Per-instance `TrieStore` counters (`CloneForReadOnlyCount`, `FallbackNotShareableCount`, `SharedNodeHitCount`) replace global debug metrics and provide test-only observability without ceremony. - Add regression coverage for shared traversal, wrapper forwarding, `CappedArray` slices, and read-flag preservation through the read-only path. ## Types of changes - [ ] Bugfix (a non-breaking change that fixes an issue) - [x] New feature (a non-breaking change that adds functionality) - [ ] Breaking change (a change that causes existing functionality not to work as expected) - [x] Optimization - [ ] Refactoring - [ ] Documentation update - [ ] Build-related changes - [ ] Other: _Description_ ## Testing #### Requires testing - [x] Yes #### If yes, did you write tests? - [x] Yes #### Notes on testing - dotnet build src/Nethermind/Nethermind.Trie/Nethermind.Trie.csproj -c Debug --no-restore -m:1 - dotnet build src/Nethermind/Nethermind.Trie.Test/Nethermind.Trie.Test.csproj -c Debug --no-restore -p:BuildProjectReferences=false -m:1 - dotnet build src/Nethermind/Nethermind.Consensus/Nethermind.Consensus.csproj -c Debug --no-restore -p:BuildProjectReferences=false -m:1 - dotnet build src/Nethermind/Nethermind.Synchronization/Nethermind.Synchronization.csproj -c Debug --no-restore -m:1 - dotnet build src/Nethermind/Nethermind.Trie/Nethermind.Trie.csproj -c Release --no-restore -m:1 - dotnet build src/Nethermind/Nethermind.Trie.Test/Nethermind.Trie.Test.csproj -c Release --no-restore -p:BuildProjectReferences=false -m:1 - Full Nethermind.Trie.Test suite (446 passed, 4 pre-existing skips) in Debug. - Focused shared-traversal and wrapper-forwarding tests in Debug and Release. ## Documentation #### Requires documentation update - [ ] Yes - [x] No #### Requires explanation in Release Notes - [ ] Yes - [x] No ## Remarks - Public read-only trie lookup remains clone-safe; shared traversal is opt-in through PatriciaTree's synchronous resolver path. - Test observability uses per-`TrieStore`-instance counters read off the specific store under test - no global mutable state, no reset ceremony, no `#if DEBUG` machinery in the production binary.

claude · 2026-05-09T14:12:49Z

Claude finished @benaadams's task in 12m 41s —— View job

Review

Gather context and read rule files
Read changed files
Analyze correctness, security, performance
Post inline comments
Post final review summary

Summary

This is a well-engineered, high-impact optimization. The measured allocation reduction (19.2 GB → 1.4 MB) is dramatic and the design is sound. The seqlock, CAS-based child publication, and opt-in shared traversal path are all correctly implemented. No critical or high findings.

Critical: 0 | High: 0 | Medium: 1 | Low: 3

Medium

M1 — Seqlock bit-63/sequence-counter collision is undocumented (TrieNode.cs:52)

The slice flag (RlpSliceFlag = 1UL << 63) and the sequence counter share the upper 32 bits of _rlpSeqAndLength. After ~2^30 WriteRlp completions on a single node, doneSeq reaches 0x80000000, setting bit 63. A reader takes the slice branch and extracts offset = (0x80000000 & 0x7FFFFFFF) = 0, length = L — correct by coincidence for normal nodes. Safety depends on WriteRlp's undocumented self-healing: IsRlpSlice(current) ? 0 : ... (new in this PR) resets seq = 0, preventing the sequence from ever reaching 0x80000002 (which would give offset = 2 and return wrong data). The invariant is load-bearing for the design's correctness but absent from any comment. This is unreachable in production (a node is resolved at most a handful of times), but the design would be fragile to maintain without documentation. Fix this →

Low

L1 — _commitBuffer read without Volatile.Read in FindCachedOrUnknownShared (TrieStore.cs:542)

The new internal TrieNode FindCachedOrUnknownShared(...) reads _commitBuffer as a plain field. The .NET memory model doesn't guarantee visibility without an acquire fence. In practice, a stale null read causes a cache miss and falls through to _dirtyNodes (correct, slightly slower); a stale non-null read is safe because CommitBuffer.FindCachedOrUnknownShared checks both the buffer and _trieStore._dirtyNodes. Using Volatile.Read(ref _commitBuffer) would align this with the store's defensive style and eliminate the technically-undefined behavior.

L2 — WriteRlp self-heal and comment inaccuracy (TrieNode.cs:111)

The IsRlpSlice(current) ? 0 : ... self-heal is new but unCommented. The existing summary comment says "~2 billion writes before wrap" when the bit-63 collision first occurs at ~1 billion. Both are addressed in the inline note.

L3 — InitRlpSlice offset arithmetic worth commenting (TrieNode.cs:399)

parentRlp.Offset + offset silently converts rlpStream.Position (relative to the CappedArray's logical view) to an absolute byte-array index. A one-line comment would save future readers from wondering why both are summed.

Correctness & Safety Observations

Seqlock memory model (x64 / ARM64): Volatile.Read provides the necessary full compiler fence; on ARM64 the explicit Interlocked.MemoryBarrier() adds the hardware fence. The TSO model on x64 makes plain _rlpArray reads safe between the two Volatile.Read calls. Analysis is sound.
Shared node mutation (UnresolveChild): Concurrent readers of a shared sealed node can race on _nodeData[i] via PublishChild / UnresolveChild. Both use Interlocked.CompareExchange so only one wins per slot. The repeated resolve/unresolve cycle is the same behavior as the non-shared path and correctly bounded by the seqlock on the RLP field. No new hazard.
Slice → normal transition safety: InitRlpSlice is only called from the single-threaded private TrieNode(CappedArray<byte>, int, int) constructor. Inline children are always sealed (_blockAndFlags = 0 ⟹ IsDirty = false ⟹ IsSealed = true) so WriteRlp can never race with their slice state. Correct.
Pool safety for slice nodes: Inline children hold a direct byte[] reference from their parent's underlying array. Sealed parent nodes never have their RLP returned to a pool (bufferPool.SafeReturn only touches dirty nodes during GenerateKey), so the parent array outlives its children. Safe.
CommitBuffer.FindCachedOrUnknownShared after flush: If _commitBuffer is captured non-null and then flushed, the captured CommitBuffer still checks _trieStore._dirtyNodes as the fallback. No data is lost. Correct.
OverlayTrieStore.SharedOverlayTraversalResolver.FindCachedOrUnknown: Delegates only to the base resolver, same as OverlayTrieStore.FindCachedOrUnknown. Nodes committed to the overlay DB but absent from base cache fall through to LoadRlp on both paths. Consistent.
WitnessCapturingTrieStore: Shared path correctly captures nodes via CaptureNode. Duplicate captures are harmlessly dropped by _rlpCollector.TryAdd. Correct.

Performance

The allocation delta is striking and the design is cost-appropriate: one Volatile.Read overhead per seqlock fast-path read, zero heap allocation for sealed-node traversals. The per-instance counters (SharedNodeHitCount, FallbackNotShareableCount, CloneForReadOnlyCount) provide production-visible observability without #if DEBUG cruft. Well done.

Copilot

Pull request overview

This PR introduces an opt-in shared resolver path for synchronous read-only trie traversal so cached, sealed TrieNode instances can be reused across walks (eliminating most CloneForReadOnly allocations), while preserving clone-safe behavior for the public IReadOnlyTrieStore lookup APIs.

Changes:

Add ITrieNodeResolverSource + IScopedReadOnlyTraversalProvider and wire store wrappers to provide a shared read-only traversal resolver where safe.
Update TrieNode to support seqlock-protected RLP state and inline-child RLP slicing (backed by a shared parent array) with safer slot publication for children.
Switch PatriciaTree’s synchronous read path to prefer the shared traversal resolver; add regression tests for sharing, wrapper forwarding, and slice behavior.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/Nethermind/Nethermind.Trie/TrieNodeResolverWithReadFlags.cs	Implements `ITrieNodeResolverSource` so read-only traversal keeps `ReadFlags` behavior.
src/Nethermind/Nethermind.Trie/TrieNode.cs	Adds RLP seqlock + inline-child RLP slicing and publish/Volatile patterns for child slots.
src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs	Adds shared cached-node lookup path, per-instance counters, and updates `HasRoot` logic.
src/Nethermind/Nethermind.Trie/Pruning/TrieNodeResolverExtensions.cs	Adds `AsReadOnlyTraversal()` helper for resolver selection.
src/Nethermind/Nethermind.Trie/Pruning/ScopedTrieStore.cs	Makes scoped store expose `ITrieNodeResolverSource` via full-store provider capability.
src/Nethermind/Nethermind.Trie/Pruning/ReadOnlyTrieStore.cs	Implements `IScopedReadOnlyTraversalProvider` to return a shared traversal resolver for `TrieStore`.
src/Nethermind/Nethermind.Trie/Pruning/ReadOnlyTraversalResolverBase.cs	Common forwarding base class for shared traversal resolvers across store wrappers.
src/Nethermind/Nethermind.Trie/Pruning/OverlayTrieStore.cs	Implements shared traversal provider that delegates cached-node lookup to base store’s traversal resolver.
src/Nethermind/Nethermind.Trie/Pruning/ITrieNodeResolverSource.cs	New capability interface for producing a shared traversal resolver.
src/Nethermind/Nethermind.Trie/Pruning/IScopedReadOnlyTraversalProvider.cs	New store capability interface for address-scoped shared traversal resolvers.
src/Nethermind/Nethermind.Trie/PreCachedTrieStore.cs	Adds traversal resolver forwarding so pre-block caching behavior is preserved on shared reads.
src/Nethermind/Nethermind.Trie/PatriciaTree.cs	Uses `AsReadOnlyTraversal()` resolver on synchronous read paths to avoid clone-heavy lookups.
src/Nethermind/Nethermind.Trie/CachedTrieStore.cs	Exposes a traversal resolver that preserves its caching behavior for shared read-only traversal.
src/Nethermind/Nethermind.Trie.Test/TrieNodeTests.cs	Adds test ensuring inline-child RLP slices share the parent backing array.
src/Nethermind/Nethermind.Trie.Test/TrieNodeResolverWithReadFlagsTests.cs	Adds test ensuring traversal resolver path preserves `ReadFlags`.
src/Nethermind/Nethermind.Trie.Test/Pruning/TreeStoreTests.cs	Adds tests asserting shared traversal avoids cloning across multiple store wrappers and commit-buffer mode.
src/Nethermind/Nethermind.Trie.Test/OverlayTrieStoreTests.cs	Adds coverage for shared traversal behavior through `OverlayTrieStore`.
src/Nethermind/Nethermind.Synchronization/SnapSync/SnapUpperBoundAdapter.cs	Forwards `ITrieNodeResolverSource` so snap upper-bound wrapper doesn’t break traversal sharing.
src/Nethermind/Nethermind.Core/Buffers/CappedArray.cs	Adds offset support to represent slices without copying.
src/Nethermind/Nethermind.Consensus/Stateless/WitnessCapturingTrieStore.cs	Adds traversal resolver forwarding while keeping witness capture on the read path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

benaadams · 2026-05-09T14:43:46Z

@claude re-review

claude · 2026-05-09T14:44:00Z

Claude finished @benaadams's task in 8m 46s —— View job

Claude finished @benaadams's task —

Re-review

Gather context and read rule files
Read changed files
Analyze correctness, security, performance
Post inline comments
Post final review summary

Summary

All four findings from the prior review have been addressed:

Prior finding	Status
M1 — Seqlock bit-63 collision undocumented	✅ Comment added at `TrieNode.cs:52–55`
L1 — `_commitBuffer` read without `Volatile.Read` in `FindCachedOrUnknownShared`	✅ `Volatile.Read(ref _commitBuffer)` now used at line 542
L2 — Self-heal reset undocumented; "~2 billion writes" comment inaccurate	✅ Comment added at `TrieNode.cs:118–120`; summary updated
L3 — `InitRlpSlice` offset arithmetic	✅ Comment added in the `TrieNode(CappedArray, int, int)` constructor

Critical: 0 | High: 0 | Medium: 0 | Low: 1

Low

New L1 — IsSealed guard removed in GetNodeByPath (PatriciaTree.cs:992)

The old guard (node.IsSealed && node.Keccak is not null && path.Length % 2 == 1) prevented a redundant FindCachedOrUnknown call for dirty nodes. Without IsSealed, a dirty non-sealed node with a Keccak on an odd-length path triggers a cache lookup and may substitute a stale clone for the live dirty reference. Safe in practice because GetNodeByPath is used almost exclusively on read-only trees (all nodes sealed), but worth a short comment or guard restoration.

Notes on Copilot inline comments

Copilot comment — "remove resolver parameter from PreDecodeChildrenIfBranch": No resolver parameter exists on PreDecodeChildrenIfBranch; the comment is misdirected and can be ignored.

Copilot comment — "FindCachedOrUnknownShared doesn't increment Metrics.LoadedFromCacheNodesCount": Incorrect — ShareOrCloneForReadOnly (called from every hit path in FindCachedOrUnknownShared) increments Metrics.LoadedFromCacheNodesCount unconditionally. No change needed.

Additional correctness observations

ChildIterator inline-child stream fix: Adding _rlpStream.SkipBytes(length) and _currentStreamIndex++ for the default (inline) branch makes the state machine explicit where the old code relied on the next iteration's skip-forward logic. Both are correct; new code is cleaner.
HasRoot refactor: New HasCachedRlp(key) || TryLoadRlp(...) is semantically equivalent to the old FindCachedOrUnknown(isReadOnly: true) → type-check path, eliminates the clone, and the test confirms CloneForReadOnlyCount stays flat. ✅
CappedArray.AsSpan(int, int) semantics shift: start is now relative to the logical CappedArray origin (offset-based). All existing callers had _offset == 0 before this PR, so they are unaffected. New slice callers are written with relative indices. ✅
ReadOnlyTrieStore.GetTrieStore switch to ScopedTrieStore: ReadOnlyTrieStore.BeginCommit returns NullCommitter.Instance, so ScopedTrieStore's delegation is identical to the removed ScopedReadOnlyTrieStore. ✅

benaadams · 2026-05-09T15:00:29Z

@claude re-review

claude · 2026-05-09T15:00:50Z

Claude finished @benaadams's task in 5m 56s —— View job

Re-review (3rd pass)

Gather context and read rule files
Read changed files (focusing on changes since 2nd review)
Analyze correctness, security, performance
Post inline comments
Post final review summary

Summary

The one remaining finding from the 2nd review has been resolved:

Prior finding	Status
New L1 (2nd pass) — `IsSealed` guard removed in `DoWarmUpPath`	✅ Restored in commit `06a7582` (`fix(trie): keep warm-up cache refresh sealed-only`)

Critical: 0 | High: 0 | Medium: 0 | Low: 0

Additional observations (3rd pass)

HasCachedRlp reads _commitBuffer without Volatile.Read — HasCachedRlp(in Key) (new in this PR, used only by the HasRoot path) reads _commitBuffer with a plain field access, consistent with the pre-existing FindCachedOrUnknown(Hash256?, TreePath, Hash256, bool). The hot-path FindCachedOrUnknownShared was correctly upgraded to Volatile.Read (L1 fix in 2nd review). The HasRoot path is not hot, so the inconsistency is acceptable and follows existing codebase convention. No action required.

All correctness/safety observations from prior passes remain valid — seqlock memory model (x64/ARM64), shared-node mutation via CAS, pool safety for slice nodes, post-flush CommitBuffer fallback correctness, and counter observability via per-instance Interlocked fields are all correctly implemented.

Verdict

All prior findings are resolved. The PR is clean and ready to merge.

claude · 2026-05-09T15:02:46Z

Claude finished @benaadams's task in 5m 37s —— View job

Re-review (3rd pass)

Gather context and understand request
Read rule files and changed files
Check fix for 2nd-review L1 finding
Full correctness/security/performance pass on current branch state
Post final review summary

Summary

All findings from the prior two reviews have been resolved.

Prior finding	Status
M1 (1st pass) — Seqlock bit-63 collision undocumented	✅ Comment added at `TrieNode.cs:52–55`
L1 (1st pass) — `_commitBuffer` without `Volatile.Read`	✅ `Volatile.Read(ref _commitBuffer)` at line 542
L2 (1st pass) — Self-heal undocumented; "~2 billion" comment inaccurate	✅ Comment added at `TrieNode.cs:118–120`
L3 (1st pass) — `InitRlpSlice` offset arithmetic	✅ Comment added in `TrieNode(CappedArray, int, int)` constructor
L1 (2nd pass) — `IsSealed` guard removed in `DoWarmUpPath`	✅ Restored in commit `06a75824`

Critical: 0 | High: 0 | Medium: 0 | Low: 0

Additional correctness observations (3rd pass)

CappedArray.AsSpan(int, int) semantics change: start is now relative to _offset. No callers in the codebase use the 2-arg form on non-zero-offset arrays; only the new inline-child path uses it (via ResolveChildWithChildPath and PreDecodeChildrenIfBranch), and those callers pass rlpStream.Position which is correctly relative to the CappedArray's logical origin. ✅
OverlayTrieStore.GetReadOnlyTraversalResolver always returns non-null: Correct — even if _baseStore doesn't support sharing, AsReadOnlyTraversal() falls back to the plain scoped resolver, so LoadRlp still hits _nodeStorage first through ReadOnlyTraversalResolverBase. No silent degradation. ✅
WitnessCapturingTrieStore with inline nodes: CaptureNode is gated on NodeType != Unknown. Inline children created by CreateInlineChild have _nodeData = null so NodeType == Unknown — they are correctly skipped and never cause a null-Keccak TryAdd. ✅
CommitBuffer.FindCachedOrUnknownShared node visibility: Buffer shard and main shard are both ConcurrentDictionary-backed; all reads are safe without _pruningLock. Consistent with the pre-existing non-buffer path. ✅

Verdict

Mergeable. This is a correct, well-tested, high-impact optimization. No open findings.

github-actions · 2026-05-09T15:14:06Z

EXPB Benchmark Comparison

Run: View workflow run

superblocks

Scenario: nethermind-flat-superblocks-trie-improvements-delay0s

Client Processing (SSE)

Metric	PR	Master (cached)	Delta
AVG (ms)	1256.95	1267.28	-0.82%
MEDIAN (ms)	1165.0	1159.2	+0.50%
P90 (ms)	1691.5	1655.9	+2.15%
P95 (ms)	2113.3	2216.1	-4.64%
P99 (ms)	3389.5	3658.9	-7.36%
MIN (ms)	798.2	786.5	+1.49%
MAX (ms)	3389.5	3658.9	-7.36%

K6 TTFB

Metric	PR	Master (cached)	Delta
AVG (ms)	1482.87	1486.12	-0.22%
MEDIAN (ms)	1364.07	1347.32	+1.24%
P90 (ms)	1941.80	1865.35	+4.10%
P95 (ms)	2514.64	2483.18	+1.27%
P99 (ms)	3400.08	3771.76	-9.85%
MIN (ms)	923.65	956.82	-3.47%
MAX (ms)	3553.49	3844.98	-7.58%

realblocks

Scenario: nethermind-flat-realblocks-trie-improvements-delay0s

Client Processing (SSE)

Metric	PR	Master (cached)	Delta
AVG (ms)	35.48	34.80	+1.95%
MEDIAN (ms)	29.2	28.5	+2.46%
P90 (ms)	64.8	63.2	+2.53%
P95 (ms)	76.1	76.3	-0.26%
P99 (ms)	142.6	133.4	+6.90%
MIN (ms)	0.5	0.5	+0.00%
MAX (ms)	230.2	228.9	+0.57%

K6 TTFB

Metric	PR	Master (cached)	Delta
AVG (ms)	41.31	40.27	+2.58%
MEDIAN (ms)	34.87	34.12	+2.20%
P90 (ms)	71.92	70.89	+1.45%
P95 (ms)	81.26	82.47	-1.47%
P99 (ms)	143.84	135.18	+6.41%
MIN (ms)	1.42	1.12	+26.79%
MAX (ms)	258.93	256.68	+0.88%

LukaszRozmej

Don't merge without @asdacap review

benaadams · 2026-05-10T02:58:42Z

@claude re-review

claude · 2026-05-10T02:58:57Z

Claude finished @benaadams's task in 5m 30s —— View job

Re-review (4th pass)

Gather context and understand request
Read rule files
Read changed files (new commits since 3rd review)
Analyze and post final review summary

Summary

All five items raised by @LukaszRozmej have been addressed. There are no new critical, high, or medium findings.

LukaszRozmej item	Status
One-liner — `SnapUpperBoundAdapter`, `CachedTrieStore`, `TrieNodeResolverWithReadFlags`	✅ All converted to expression-body
`?:` expression — `FindCachedOrUnknownShared` public + private overloads	✅ Both now expression-body one-liners
One-liner — remaining multi-line `TrieStore` methods	✅ Collapsed
Remove `Interlocked` counting from hot path (test-only counters)	✅ Gated behind `_collectReadOnlyTraversalStats`; flag enabled lazily on first counter property access; `[AggressiveInlining]` branch is near-zero cost when false
Code duplication with `ResolveChildWithChildPath` at `TrieNode.cs:587`	✅ Extracted as `DecodeChildReference(CappedArray<byte>, ref ValueRlpStream)` — used by all three callsites (`PreDecodeChildrenIfBranch`, `ResolveChildWithChildPath`, `ChildIterator.MoveNext`)

Critical: 0 | High: 0 | Medium: 0 | Low: 0

New code correctness (4th pass)

DecodeChildReference extraction — the three call-sites differ only in what they do after decoding: PreDecodeChildrenIfBranch stores the raw ref (Hash256 or TrieNode) without a cache lookup; ResolveChildWithChildPath and ChildIterator follow a Hash256 with FindCachedOrUnknown. The refactored code correctly separates decoding (stream advancement) from resolution (cache lookup), and _currentStreamIndex++ is placed after the whole block — matching old per-case placement. ✅

SenderTxGroup struct — CollectionsMarshal.GetValueRefOrAddDefault returns a ref into the dictionary; group.Add(i) writes RestTxIndexes back through that ref, which is correct. Dispose is called on each KeyValuePair.Value copy in the finally loop; since RestTxIndexes is a reference type, the pool-return goes through the same list object. ✅

Shared warmup session (TryWarmupShared) — IPreBlockCacheWarmupSession is disposed by using before envPool.Return(env), regardless of the CanBeShared early-return path. ✅ When CanBeShared == true, GetAccountUncached bypasses _loadedAccounts (no shared mutable dictionary), CreateStorageTreeUncachedAccount creates a fresh tree per call (no aliasing), and storageCache/preBlockCache (SeqlockCache) are designed for concurrent writes. Shared session is correctly safe for concurrent workers. ✅

CappedArray bounds hardening — logical-slice bounds ((uint)index >= (uint)_length) are now checked first, before _offset + index arithmetic, preventing wrap-around reads outside the slice. Same fix applied to AsSpan(int, int) and AsArraySegment(int, int). ✅

StorageTreeWrapper.Get(in ValueHash256 hash) behavioural change — now routes through the full caching path (Get(new StorageCell(address, hash))), removing the "not a critical path" bypass. Hash-keyed storage reads are now warmed and cached. Semantically equivalent at the tree level (storageCell.IsHash == true → baseStorageTree.Get(hash)). ✅

HasCachedRlp in HasRoot path — now uses Volatile.Read(ref _commitBuffer), consistent with the fix applied to FindCachedOrUnknownShared in the 2nd review. ✅

Benchmark results — superblocks: P95/P99 −4.6%/−7.4% (clear improvement); median +0.5% (noise). Realblocks: P99 +6.9%, median +2.5% (minor, within single-run variance). The superblocks workload (large block payloads with deep trie traversal) is the primary beneficiary of this PR's allocation reduction; the small realblocks overhead is expected variance. No regression. ✅

Verdict

Mergeable. All prior findings resolved. All LukaszRozmej feedback addressed. The PR is correct, well-tested, and clean.

github-actions Bot added new feature performance is good snap sync sync trie labels May 9, 2026

benaadams changed the title ~~perf(trie): share cached nodes across read-only walks~~ perf(trie): share cached nodes across read-only walks (CloneForReadOnly 19.2 GB -> 1.4 MB) May 9, 2026

lint

0c71a33

claude Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/Nethermind/Nethermind.Trie/TrieNode.cs

claude Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated

benaadams requested a review from Copilot May 9, 2026 14:23

Copilot started reviewing on behalf of benaadams May 9, 2026 14:24 View session

claude Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/Nethermind/Nethermind.Trie/TrieNode.cs

claude Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/Nethermind/Nethermind.Trie/TrieNode.cs

Copilot AI reviewed May 9, 2026

View reviewed changes

Comment thread src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated

Comment thread src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated

benaadams added 2 commits May 9, 2026 15:38

feedback

bfaf29a

fix

0ae6b38

claude Bot reviewed May 9, 2026

View reviewed changes

Comment thread src/Nethermind/Nethermind.Trie/PatriciaTree.cs Outdated

fix(trie): keep warm-up cache refresh sealed-only

06a7582

benaadams marked this pull request as ready for review May 9, 2026 15:02

benaadams requested review from LukaszRozmej, asdacap, damian-orzechowski, flcl42 and marcindsobczak as code owners May 9, 2026 15:02

benaadams requested a review from MarekM25 as a code owner May 9, 2026 15:02

benaadams added the reproducible-benchmark label May 9, 2026

benaadams added 2 commits May 9, 2026 18:52

Dedupe tests

8dc6fb3

Improves

0e5c9ab

LukaszRozmej approved these changes May 9, 2026

View reviewed changes

benaadams added 2 commits May 9, 2026 23:14

Improve warming

593f84c

Allocate less in GroupTransactionsBySender

60e02f3

github-actions Bot added evm state+storage labels May 9, 2026

Feedback

45416af

benaadams added skip-evm-benchmarks reproducible-benchmark and removed reproducible-benchmark labels May 9, 2026

Conversation

benaadams commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Impact

Types of changes

Testing

Requires testing

If yes, did you write tests?

Documentation

Requires documentation update

Remarks

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Summary

Medium

Low

Correctness & Safety Observations

Performance

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

benaadams commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review

Summary

Low

Notes on Copilot inline comments

Additional correctness observations

Uh oh!

Uh oh!

benaadams commented May 9, 2026

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review (3rd pass)

Summary

Additional observations (3rd pass)

Verdict

Uh oh!

claude Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review (3rd pass)

Summary

Additional correctness observations (3rd pass)

Verdict

Uh oh!

github-actions Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

EXPB Benchmark Comparison

superblocks

realblocks

Uh oh!

LukaszRozmej left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benaadams commented May 10, 2026

Uh oh!

claude Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

benaadams commented May 9, 2026 •

edited

Loading

claude Bot commented May 9, 2026 •

edited

Loading

claude Bot commented May 9, 2026 •

edited

Loading

claude Bot commented May 9, 2026 •

edited

Loading

claude Bot commented May 9, 2026 •

edited

Loading

github-actions Bot commented May 9, 2026 •

edited

Loading

claude Bot commented May 10, 2026 •

edited

Loading