Skip to content

perf(trie): share cached nodes across read-only walks (CloneForReadOnly 19.2 GB -> 1.4 MB)#11539

Open
benaadams wants to merge 10 commits intomasterfrom
trie-improvements
Open

perf(trie): share cached nodes across read-only walks (CloneForReadOnly 19.2 GB -> 1.4 MB)#11539
benaadams wants to merge 10 commits intomasterfrom
trie-improvements

Conversation

@benaadams
Copy link
Copy Markdown
Member

@benaadams benaadams commented May 9, 2026

Changes

  • Add ITrieNodeResolverSource capability for sharing immutable cached trie nodes on synchronous read-only traversal, plus IScopedReadOnlyTraversalProvider for full stores to opt in via the standard ScopedTrieStore.
  • Tighten TrieNode: seqlock-protected RLP read/write (atomic _rlpArray + _rlpSeqAndLength), inline-child slice packing, slot publication via Volatile.Read/CAS pairing for x64/ARM64 ordering.
  • TrieStore/ReadOnlyTrieStore/OverlayTrieStore/PreCachedTrieStore/WitnessCapturingTrieStore opt in through one shared ReadOnlyTraversalResolverBase; only FindCachedOrUnknown and per-address rebuild differ per store.
  • PatriciaTree resolves through the shared traversal resolver on synchronous reads; public IReadOnlyTrieStore lookup stays clone-preserving.
  • Per-instance TrieStore counters (CloneForReadOnlyCount, FallbackNotShareableCount, SharedNodeHitCount) replace global debug metrics and provide test-only observability without ceremony.
  • Add regression coverage for shared traversal, wrapper forwarding, CappedArray slices, and read-flag preservation through the read-only path.

Impact

Per-block CloneForReadOnly allocations on the read-only trie path drop from ~19.2 GB to ~1.4 MB (≈14,000×, 99.99% reduction). The dominant PatriciaTree.Get → CloneForReadOnly branch (14.9 GB / 77.5% of the before-tree) is eliminated - reads now share the cached, sealed TrieNode directly via the read-only traversal resolver. The residual ~1.4 MB is the unavoidable fallback for non-shareable cases (cached-RLP-only and unsealed dirty-cache hits).

Before Allocations 19.2 GB

image

After Allocations 1.4 MB

image

Types of changes

  • New feature (a non-breaking change that adds functionality)
  • Optimization

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • Yes

Documentation

Requires documentation update

  • No

Remarks

  • Public read-only trie lookup remains clone-safe; shared traversal is opt-in through PatriciaTree's synchronous resolver path.
  • Test observability uses per-TrieStore-instance counters read off the specific store under test - no global mutable state, no reset ceremony, no #if DEBUG machinery in the production binary.

## Changes

- Add `ITrieNodeResolverSource` capability for sharing immutable cached trie nodes on synchronous read-only traversal, plus `IScopedReadOnlyTraversalProvider` for full stores to opt in via the standard `ScopedTrieStore`.
- Tighten `TrieNode`: seqlock-protected RLP read/write (atomic `_rlpArray` + `_rlpSeqAndLength`), inline-child slice packing, slot publication via `Volatile.Read`/CAS pairing for x64/ARM64 ordering.
- `TrieStore`/`ReadOnlyTrieStore`/`OverlayTrieStore`/`PreCachedTrieStore`/`WitnessCapturingTrieStore` opt in through one shared `ReadOnlyTraversalResolverBase`; only `FindCachedOrUnknown` and per-address rebuild differ per store.
- `PatriciaTree` resolves through the shared traversal resolver on synchronous reads; public `IReadOnlyTrieStore` lookup stays clone-preserving.
- Per-instance `TrieStore` counters (`CloneForReadOnlyCount`, `FallbackNotShareableCount`, `SharedNodeHitCount`) replace global debug metrics and provide test-only observability without ceremony.
- Add regression coverage for shared traversal, wrapper forwarding, `CappedArray` slices, and read-flag preservation through the read-only path.

## Types of changes

- [ ] Bugfix (a non-breaking change that fixes an issue)
- [x] New feature (a non-breaking change that adds functionality)
- [ ] Breaking change (a change that causes existing functionality not to work as expected)
- [x] Optimization
- [ ] Refactoring
- [ ] Documentation update
- [ ] Build-related changes
- [ ] Other: _Description_

## Testing

#### Requires testing

- [x] Yes

#### If yes, did you write tests?

- [x] Yes

#### Notes on testing

- dotnet build src/Nethermind/Nethermind.Trie/Nethermind.Trie.csproj -c Debug --no-restore -m:1
- dotnet build src/Nethermind/Nethermind.Trie.Test/Nethermind.Trie.Test.csproj -c Debug --no-restore -p:BuildProjectReferences=false -m:1
- dotnet build src/Nethermind/Nethermind.Consensus/Nethermind.Consensus.csproj -c Debug --no-restore -p:BuildProjectReferences=false -m:1
- dotnet build src/Nethermind/Nethermind.Synchronization/Nethermind.Synchronization.csproj -c Debug --no-restore -m:1
- dotnet build src/Nethermind/Nethermind.Trie/Nethermind.Trie.csproj -c Release --no-restore -m:1
- dotnet build src/Nethermind/Nethermind.Trie.Test/Nethermind.Trie.Test.csproj -c Release --no-restore -p:BuildProjectReferences=false -m:1
- Full Nethermind.Trie.Test suite (446 passed, 4 pre-existing skips) in Debug.
- Focused shared-traversal and wrapper-forwarding tests in Debug and Release.

## Documentation

#### Requires documentation update

- [ ] Yes
- [x] No

#### Requires explanation in Release Notes

- [ ] Yes
- [x] No

## Remarks

- Public read-only trie lookup remains clone-safe; shared traversal is opt-in through PatriciaTree's synchronous resolver path.
- Test observability uses per-`TrieStore`-instance counters read off the specific store under test - no global mutable state, no reset ceremony, no `#if DEBUG` machinery in the production binary.
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 9, 2026

Claude finished @benaadams's task in 12m 41s —— View job


Review

  • Gather context and read rule files
  • Read changed files
  • Analyze correctness, security, performance
  • Post inline comments
  • Post final review summary

Summary

This is a well-engineered, high-impact optimization. The measured allocation reduction (19.2 GB → 1.4 MB) is dramatic and the design is sound. The seqlock, CAS-based child publication, and opt-in shared traversal path are all correctly implemented. No critical or high findings.

Critical: 0 | High: 0 | Medium: 1 | Low: 3


Medium

M1 — Seqlock bit-63/sequence-counter collision is undocumented (TrieNode.cs:52)

The slice flag (RlpSliceFlag = 1UL << 63) and the sequence counter share the upper 32 bits of _rlpSeqAndLength. After ~2^30 WriteRlp completions on a single node, doneSeq reaches 0x80000000, setting bit 63. A reader takes the slice branch and extracts offset = (0x80000000 & 0x7FFFFFFF) = 0, length = L — correct by coincidence for normal nodes. Safety depends on WriteRlp's undocumented self-healing: IsRlpSlice(current) ? 0 : ... (new in this PR) resets seq = 0, preventing the sequence from ever reaching 0x80000002 (which would give offset = 2 and return wrong data). The invariant is load-bearing for the design's correctness but absent from any comment. This is unreachable in production (a node is resolved at most a handful of times), but the design would be fragile to maintain without documentation. Fix this →


Low

L1 — _commitBuffer read without Volatile.Read in FindCachedOrUnknownShared (TrieStore.cs:542)

The new internal TrieNode FindCachedOrUnknownShared(...) reads _commitBuffer as a plain field. The .NET memory model doesn't guarantee visibility without an acquire fence. In practice, a stale null read causes a cache miss and falls through to _dirtyNodes (correct, slightly slower); a stale non-null read is safe because CommitBuffer.FindCachedOrUnknownShared checks both the buffer and _trieStore._dirtyNodes. Using Volatile.Read(ref _commitBuffer) would align this with the store's defensive style and eliminate the technically-undefined behavior.

L2 — WriteRlp self-heal and comment inaccuracy (TrieNode.cs:111)

The IsRlpSlice(current) ? 0 : ... self-heal is new but unCommented. The existing summary comment says "~2 billion writes before wrap" when the bit-63 collision first occurs at ~1 billion. Both are addressed in the inline note.

L3 — InitRlpSlice offset arithmetic worth commenting (TrieNode.cs:399)

parentRlp.Offset + offset silently converts rlpStream.Position (relative to the CappedArray's logical view) to an absolute byte-array index. A one-line comment would save future readers from wondering why both are summed.


Correctness & Safety Observations

  • Seqlock memory model (x64 / ARM64): Volatile.Read provides the necessary full compiler fence; on ARM64 the explicit Interlocked.MemoryBarrier() adds the hardware fence. The TSO model on x64 makes plain _rlpArray reads safe between the two Volatile.Read calls. Analysis is sound.
  • Shared node mutation (UnresolveChild): Concurrent readers of a shared sealed node can race on _nodeData[i] via PublishChild / UnresolveChild. Both use Interlocked.CompareExchange so only one wins per slot. The repeated resolve/unresolve cycle is the same behavior as the non-shared path and correctly bounded by the seqlock on the RLP field. No new hazard.
  • Slice → normal transition safety: InitRlpSlice is only called from the single-threaded private TrieNode(CappedArray<byte>, int, int) constructor. Inline children are always sealed (_blockAndFlags = 0 ⟹ IsDirty = false ⟹ IsSealed = true) so WriteRlp can never race with their slice state. Correct.
  • Pool safety for slice nodes: Inline children hold a direct byte[] reference from their parent's underlying array. Sealed parent nodes never have their RLP returned to a pool (bufferPool.SafeReturn only touches dirty nodes during GenerateKey), so the parent array outlives its children. Safe.
  • CommitBuffer.FindCachedOrUnknownShared after flush: If _commitBuffer is captured non-null and then flushed, the captured CommitBuffer still checks _trieStore._dirtyNodes as the fallback. No data is lost. Correct.
  • OverlayTrieStore.SharedOverlayTraversalResolver.FindCachedOrUnknown: Delegates only to the base resolver, same as OverlayTrieStore.FindCachedOrUnknown. Nodes committed to the overlay DB but absent from base cache fall through to LoadRlp on both paths. Consistent.
  • WitnessCapturingTrieStore: Shared path correctly captures nodes via CaptureNode. Duplicate captures are harmlessly dropped by _rlpCollector.TryAdd. Correct.

Performance

The allocation delta is striking and the design is cost-appropriate: one Volatile.Read overhead per seqlock fast-path read, zero heap allocation for sealed-node traversals. The per-instance counters (SharedNodeHitCount, FallbackNotShareableCount, CloneForReadOnlyCount) provide production-visible observability without #if DEBUG cruft. Well done.

@benaadams benaadams changed the title perf(trie): share cached nodes across read-only walks perf(trie): share cached nodes across read-only walks (CloneForReadOnly 19.2 GB -> 1.4 MB) May 9, 2026
Comment thread src/Nethermind/Nethermind.Trie/TrieNode.cs
Comment thread src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated
Comment thread src/Nethermind/Nethermind.Trie/TrieNode.cs
Comment thread src/Nethermind/Nethermind.Trie/TrieNode.cs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an opt-in shared resolver path for synchronous read-only trie traversal so cached, sealed TrieNode instances can be reused across walks (eliminating most CloneForReadOnly allocations), while preserving clone-safe behavior for the public IReadOnlyTrieStore lookup APIs.

Changes:

  • Add ITrieNodeResolverSource + IScopedReadOnlyTraversalProvider and wire store wrappers to provide a shared read-only traversal resolver where safe.
  • Update TrieNode to support seqlock-protected RLP state and inline-child RLP slicing (backed by a shared parent array) with safer slot publication for children.
  • Switch PatriciaTree’s synchronous read path to prefer the shared traversal resolver; add regression tests for sharing, wrapper forwarding, and slice behavior.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/Nethermind/Nethermind.Trie/TrieNodeResolverWithReadFlags.cs Implements ITrieNodeResolverSource so read-only traversal keeps ReadFlags behavior.
src/Nethermind/Nethermind.Trie/TrieNode.cs Adds RLP seqlock + inline-child RLP slicing and publish/Volatile patterns for child slots.
src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Adds shared cached-node lookup path, per-instance counters, and updates HasRoot logic.
src/Nethermind/Nethermind.Trie/Pruning/TrieNodeResolverExtensions.cs Adds AsReadOnlyTraversal() helper for resolver selection.
src/Nethermind/Nethermind.Trie/Pruning/ScopedTrieStore.cs Makes scoped store expose ITrieNodeResolverSource via full-store provider capability.
src/Nethermind/Nethermind.Trie/Pruning/ReadOnlyTrieStore.cs Implements IScopedReadOnlyTraversalProvider to return a shared traversal resolver for TrieStore.
src/Nethermind/Nethermind.Trie/Pruning/ReadOnlyTraversalResolverBase.cs Common forwarding base class for shared traversal resolvers across store wrappers.
src/Nethermind/Nethermind.Trie/Pruning/OverlayTrieStore.cs Implements shared traversal provider that delegates cached-node lookup to base store’s traversal resolver.
src/Nethermind/Nethermind.Trie/Pruning/ITrieNodeResolverSource.cs New capability interface for producing a shared traversal resolver.
src/Nethermind/Nethermind.Trie/Pruning/IScopedReadOnlyTraversalProvider.cs New store capability interface for address-scoped shared traversal resolvers.
src/Nethermind/Nethermind.Trie/PreCachedTrieStore.cs Adds traversal resolver forwarding so pre-block caching behavior is preserved on shared reads.
src/Nethermind/Nethermind.Trie/PatriciaTree.cs Uses AsReadOnlyTraversal() resolver on synchronous read paths to avoid clone-heavy lookups.
src/Nethermind/Nethermind.Trie/CachedTrieStore.cs Exposes a traversal resolver that preserves its caching behavior for shared read-only traversal.
src/Nethermind/Nethermind.Trie.Test/TrieNodeTests.cs Adds test ensuring inline-child RLP slices share the parent backing array.
src/Nethermind/Nethermind.Trie.Test/TrieNodeResolverWithReadFlagsTests.cs Adds test ensuring traversal resolver path preserves ReadFlags.
src/Nethermind/Nethermind.Trie.Test/Pruning/TreeStoreTests.cs Adds tests asserting shared traversal avoids cloning across multiple store wrappers and commit-buffer mode.
src/Nethermind/Nethermind.Trie.Test/OverlayTrieStoreTests.cs Adds coverage for shared traversal behavior through OverlayTrieStore.
src/Nethermind/Nethermind.Synchronization/SnapSync/SnapUpperBoundAdapter.cs Forwards ITrieNodeResolverSource so snap upper-bound wrapper doesn’t break traversal sharing.
src/Nethermind/Nethermind.Core/Buffers/CappedArray.cs Adds offset support to represent slices without copying.
src/Nethermind/Nethermind.Consensus/Stateless/WitnessCapturingTrieStore.cs Adds traversal resolver forwarding while keeping witness capture on the read path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated
Comment thread src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated
@benaadams
Copy link
Copy Markdown
Member Author

@claude re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 9, 2026

Claude finished @benaadams's task in 8m 46s —— View job


Claude finished @benaadams's task


Re-review

  • Gather context and read rule files
  • Read changed files
  • Analyze correctness, security, performance
  • Post inline comments
  • Post final review summary

Summary

All four findings from the prior review have been addressed:

Prior finding Status
M1 — Seqlock bit-63 collision undocumented ✅ Comment added at TrieNode.cs:52–55
L1_commitBuffer read without Volatile.Read in FindCachedOrUnknownShared Volatile.Read(ref _commitBuffer) now used at line 542
L2 — Self-heal reset undocumented; "~2 billion writes" comment inaccurate ✅ Comment added at TrieNode.cs:118–120; summary updated
L3InitRlpSlice offset arithmetic ✅ Comment added in the TrieNode(CappedArray, int, int) constructor

Critical: 0 | High: 0 | Medium: 0 | Low: 1


Low

New L1 — IsSealed guard removed in GetNodeByPath (PatriciaTree.cs:992)

The old guard (node.IsSealed && node.Keccak is not null && path.Length % 2 == 1) prevented a redundant FindCachedOrUnknown call for dirty nodes. Without IsSealed, a dirty non-sealed node with a Keccak on an odd-length path triggers a cache lookup and may substitute a stale clone for the live dirty reference. Safe in practice because GetNodeByPath is used almost exclusively on read-only trees (all nodes sealed), but worth a short comment or guard restoration.


Notes on Copilot inline comments

Copilot comment — "remove resolver parameter from PreDecodeChildrenIfBranch": No resolver parameter exists on PreDecodeChildrenIfBranch; the comment is misdirected and can be ignored.

Copilot comment — "FindCachedOrUnknownShared doesn't increment Metrics.LoadedFromCacheNodesCount": Incorrect — ShareOrCloneForReadOnly (called from every hit path in FindCachedOrUnknownShared) increments Metrics.LoadedFromCacheNodesCount unconditionally. No change needed.


Additional correctness observations

  • ChildIterator inline-child stream fix: Adding _rlpStream.SkipBytes(length) and _currentStreamIndex++ for the default (inline) branch makes the state machine explicit where the old code relied on the next iteration's skip-forward logic. Both are correct; new code is cleaner.
  • HasRoot refactor: New HasCachedRlp(key) || TryLoadRlp(...) is semantically equivalent to the old FindCachedOrUnknown(isReadOnly: true) → type-check path, eliminates the clone, and the test confirms CloneForReadOnlyCount stays flat. ✅
  • CappedArray.AsSpan(int, int) semantics shift: start is now relative to the logical CappedArray origin (offset-based). All existing callers had _offset == 0 before this PR, so they are unaffected. New slice callers are written with relative indices. ✅
  • ReadOnlyTrieStore.GetTrieStore switch to ScopedTrieStore: ReadOnlyTrieStore.BeginCommit returns NullCommitter.Instance, so ScopedTrieStore's delegation is identical to the removed ScopedReadOnlyTrieStore. ✅

Comment thread src/Nethermind/Nethermind.Trie/PatriciaTree.cs Outdated
@benaadams
Copy link
Copy Markdown
Member Author

@claude re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 9, 2026

Claude finished @benaadams's task in 5m 56s —— View job


Re-review (3rd pass)

  • Gather context and read rule files
  • Read changed files (focusing on changes since 2nd review)
  • Analyze correctness, security, performance
  • Post inline comments
  • Post final review summary

Summary

The one remaining finding from the 2nd review has been resolved:

Prior finding Status
New L1 (2nd pass)IsSealed guard removed in DoWarmUpPath ✅ Restored in commit 06a7582 (fix(trie): keep warm-up cache refresh sealed-only)

Critical: 0 | High: 0 | Medium: 0 | Low: 0


Additional observations (3rd pass)

HasCachedRlp reads _commitBuffer without Volatile.ReadHasCachedRlp(in Key) (new in this PR, used only by the HasRoot path) reads _commitBuffer with a plain field access, consistent with the pre-existing FindCachedOrUnknown(Hash256?, TreePath, Hash256, bool). The hot-path FindCachedOrUnknownShared was correctly upgraded to Volatile.Read (L1 fix in 2nd review). The HasRoot path is not hot, so the inconsistency is acceptable and follows existing codebase convention. No action required.

All correctness/safety observations from prior passes remain valid — seqlock memory model (x64/ARM64), shared-node mutation via CAS, pool safety for slice nodes, post-flush CommitBuffer fallback correctness, and counter observability via per-instance Interlocked fields are all correctly implemented.

Verdict

All prior findings are resolved. The PR is clean and ready to merge.

@benaadams benaadams marked this pull request as ready for review May 9, 2026 15:02
@benaadams benaadams requested a review from MarekM25 as a code owner May 9, 2026 15:02
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 9, 2026

Claude finished @benaadams's task in 5m 37s —— View job


Re-review (3rd pass)

  • Gather context and understand request
  • Read rule files and changed files
  • Check fix for 2nd-review L1 finding
  • Full correctness/security/performance pass on current branch state
  • Post final review summary

Summary

All findings from the prior two reviews have been resolved.

Prior finding Status
M1 (1st pass) — Seqlock bit-63 collision undocumented ✅ Comment added at TrieNode.cs:52–55
L1 (1st pass) — _commitBuffer without Volatile.Read Volatile.Read(ref _commitBuffer) at line 542
L2 (1st pass) — Self-heal undocumented; "~2 billion" comment inaccurate ✅ Comment added at TrieNode.cs:118–120
L3 (1st pass) — InitRlpSlice offset arithmetic ✅ Comment added in TrieNode(CappedArray, int, int) constructor
L1 (2nd pass) — IsSealed guard removed in DoWarmUpPath ✅ Restored in commit 06a75824

Critical: 0 | High: 0 | Medium: 0 | Low: 0


Additional correctness observations (3rd pass)

  • CappedArray.AsSpan(int, int) semantics change: start is now relative to _offset. No callers in the codebase use the 2-arg form on non-zero-offset arrays; only the new inline-child path uses it (via ResolveChildWithChildPath and PreDecodeChildrenIfBranch), and those callers pass rlpStream.Position which is correctly relative to the CappedArray's logical origin. ✅
  • OverlayTrieStore.GetReadOnlyTraversalResolver always returns non-null: Correct — even if _baseStore doesn't support sharing, AsReadOnlyTraversal() falls back to the plain scoped resolver, so LoadRlp still hits _nodeStorage first through ReadOnlyTraversalResolverBase. No silent degradation. ✅
  • WitnessCapturingTrieStore with inline nodes: CaptureNode is gated on NodeType != Unknown. Inline children created by CreateInlineChild have _nodeData = null so NodeType == Unknown — they are correctly skipped and never cause a null-Keccak TryAdd. ✅
  • CommitBuffer.FindCachedOrUnknownShared node visibility: Buffer shard and main shard are both ConcurrentDictionary-backed; all reads are safe without _pruningLock. Consistent with the pre-existing non-buffer path. ✅

Verdict

Mergeable. This is a correct, well-tested, high-impact optimization. No open findings.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

EXPB Benchmark Comparison

Run: View workflow run

superblocks

Scenario: nethermind-flat-superblocks-trie-improvements-delay0s

Client Processing (SSE)

Metric PR Master (cached) Delta
AVG (ms) 1256.95 1267.28 -0.82%
MEDIAN (ms) 1165.0 1159.2 +0.50%
P90 (ms) 1691.5 1655.9 +2.15%
P95 (ms) 2113.3 2216.1 -4.64%
P99 (ms) 3389.5 3658.9 -7.36%
MIN (ms) 798.2 786.5 +1.49%
MAX (ms) 3389.5 3658.9 -7.36%
K6 TTFB
Metric PR Master (cached) Delta
AVG (ms) 1482.87 1486.12 -0.22%
MEDIAN (ms) 1364.07 1347.32 +1.24%
P90 (ms) 1941.80 1865.35 +4.10%
P95 (ms) 2514.64 2483.18 +1.27%
P99 (ms) 3400.08 3771.76 -9.85%
MIN (ms) 923.65 956.82 -3.47%
MAX (ms) 3553.49 3844.98 -7.58%

realblocks

Scenario: nethermind-flat-realblocks-trie-improvements-delay0s

Client Processing (SSE)

Metric PR Master (cached) Delta
AVG (ms) 35.48 34.80 +1.95%
MEDIAN (ms) 29.2 28.5 +2.46%
P90 (ms) 64.8 63.2 +2.53%
P95 (ms) 76.1 76.3 -0.26%
P99 (ms) 142.6 133.4 +6.90%
MIN (ms) 0.5 0.5 +0.00%
MAX (ms) 230.2 228.9 +0.57%
K6 TTFB
Metric PR Master (cached) Delta
AVG (ms) 41.31 40.27 +2.58%
MEDIAN (ms) 34.87 34.12 +2.20%
P90 (ms) 71.92 70.89 +1.45%
P95 (ms) 81.26 82.47 -1.47%
P99 (ms) 143.84 135.18 +6.41%
MIN (ms) 1.42 1.12 +26.79%
MAX (ms) 258.93 256.68 +0.88%

Copy link
Copy Markdown
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't merge without @asdacap review

Comment thread src/Nethermind/Nethermind.Synchronization/SnapSync/SnapUpperBoundAdapter.cs Outdated
Comment thread src/Nethermind/Nethermind.Trie/CachedTrieStore.cs Outdated
Comment thread src/Nethermind/Nethermind.Trie/TrieNodeResolverWithReadFlags.cs Outdated
Comment thread src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated
Comment thread src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated
Comment thread src/Nethermind/Nethermind.Trie/Pruning/TrieStore.cs Outdated
Comment thread src/Nethermind/Nethermind.Trie/TrieNode.cs
@benaadams
Copy link
Copy Markdown
Member Author

@claude re-review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 10, 2026

Claude finished @benaadams's task in 5m 30s —— View job


Re-review (4th pass)

  • Gather context and understand request
  • Read rule files
  • Read changed files (new commits since 3rd review)
  • Analyze and post final review summary

Summary

All five items raised by @LukaszRozmej have been addressed. There are no new critical, high, or medium findings.

LukaszRozmej item Status
One-liner — SnapUpperBoundAdapter, CachedTrieStore, TrieNodeResolverWithReadFlags ✅ All converted to expression-body
?: expression — FindCachedOrUnknownShared public + private overloads ✅ Both now expression-body one-liners
One-liner — remaining multi-line TrieStore methods ✅ Collapsed
Remove Interlocked counting from hot path (test-only counters) ✅ Gated behind _collectReadOnlyTraversalStats; flag enabled lazily on first counter property access; [AggressiveInlining] branch is near-zero cost when false
Code duplication with ResolveChildWithChildPath at TrieNode.cs:587 ✅ Extracted as DecodeChildReference(CappedArray<byte>, ref ValueRlpStream) — used by all three callsites (PreDecodeChildrenIfBranch, ResolveChildWithChildPath, ChildIterator.MoveNext)

Critical: 0 | High: 0 | Medium: 0 | Low: 0


New code correctness (4th pass)

DecodeChildReference extraction — the three call-sites differ only in what they do after decoding: PreDecodeChildrenIfBranch stores the raw ref (Hash256 or TrieNode) without a cache lookup; ResolveChildWithChildPath and ChildIterator follow a Hash256 with FindCachedOrUnknown. The refactored code correctly separates decoding (stream advancement) from resolution (cache lookup), and _currentStreamIndex++ is placed after the whole block — matching old per-case placement. ✅

SenderTxGroup structCollectionsMarshal.GetValueRefOrAddDefault returns a ref into the dictionary; group.Add(i) writes RestTxIndexes back through that ref, which is correct. Dispose is called on each KeyValuePair.Value copy in the finally loop; since RestTxIndexes is a reference type, the pool-return goes through the same list object. ✅

Shared warmup session (TryWarmupShared)IPreBlockCacheWarmupSession is disposed by using before envPool.Return(env), regardless of the CanBeShared early-return path. ✅ When CanBeShared == true, GetAccountUncached bypasses _loadedAccounts (no shared mutable dictionary), CreateStorageTreeUncachedAccount creates a fresh tree per call (no aliasing), and storageCache/preBlockCache (SeqlockCache) are designed for concurrent writes. Shared session is correctly safe for concurrent workers. ✅

CappedArray bounds hardening — logical-slice bounds ((uint)index >= (uint)_length) are now checked first, before _offset + index arithmetic, preventing wrap-around reads outside the slice. Same fix applied to AsSpan(int, int) and AsArraySegment(int, int). ✅

StorageTreeWrapper.Get(in ValueHash256 hash) behavioural change — now routes through the full caching path (Get(new StorageCell(address, hash))), removing the "not a critical path" bypass. Hash-keyed storage reads are now warmed and cached. Semantically equivalent at the tree level (storageCell.IsHash == truebaseStorageTree.Get(hash)). ✅

HasCachedRlp in HasRoot path — now uses Volatile.Read(ref _commitBuffer), consistent with the fix applied to FindCachedOrUnknownShared in the 2nd review. ✅

Benchmark results — superblocks: P95/P99 −4.6%/−7.4% (clear improvement); median +0.5% (noise). Realblocks: P99 +6.9%, median +2.5% (minor, within single-run variance). The superblocks workload (large block payloads with deep trie traversal) is the primary beneficiary of this PR's allocation reduction; the small realblocks overhead is expected variance. No regression. ✅


Verdict

Mergeable. All prior findings resolved. All LukaszRozmej feedback addressed. The PR is correct, well-tested, and clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants