perf(gnovm): avoid heap-boxed byte access in copy, range and index reads#5812
Draft
thehowl wants to merge 1 commit into
Draft
perf(gnovm): avoid heap-boxed byte access in copy, range and index reads#5812thehowl wants to merge 1 commit into
thehowl wants to merge 1 commit into
Conversation
Profiling the gnovm test suites (dominated by interpreted Gno) showed ~50% of CPU in Go GC/malloc, with 67% of all heap allocations (694M objects in the bytes stdlib suite alone) coming from ArrayValue.GetPointerAtIndexInt2 materializing a *TypedValue + DataByteValue box per byte accessed: - the copy() builtin allocated two boxes per byte copied; - range over a byte slice allocated three objects per iteration (the index TypedValue escaping through GetPointerAtIndex, plus the view box) that Deref immediately discarded; - b[i] reads in doOpIndex1 did the same box-then-Deref dance. Add TypedValue.GetValueAtIntIndex, a read-only fast path mirroring GetPointerAtIndex's checks and panics for strings and Data-backed arrays/slices, and use it in doOpIndex1 and the range loop. Give the copy() builtin direct byte copies when both sides are Data-backed (or the source is a string); bounds, readonly checks, DidUpdate and CPU gas are unchanged (charged before the loop, as before), and Go's copy is overlap-safe so the backward-copy setup only remains for the List fallback. Gas is unchanged: the view boxes were raw Go allocations, never charged to the VM allocator. All 2344 filetest goldens (including Gas: and MAXALLOC-sensitive alloc tests) pass unmodified, as do the gno.land vm Gas tests and the txtar integration suite. bytes stdlib suite: 151.5s -> 105.2s; full pkg/gnolang long mode: 245.0s -> 184.6s; allocated objects in the bytes suite: 1.03G -> 0.30G; BenchmarkOpIndex1_ByteArray: 185.7ns -> 130.0ns.
This was referenced Jun 11, 2026
Collaborator
🛠 PR Checks SummaryAll Automated Checks passed. ✅ Manual Checks (for Reviewers):
Read More🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers. ✅ Automated Checks (for Contributors):No automated checks match this pull request. ☑️ Contributor Actions:
☑️ Reviewer Actions:
📚 Resources:Debug
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Part 2 of 6 of the gnovm performance stack (split from #5800), to be merged in order:
Each PR is based on the previous one's branch; this one diffs against part 1. Together:
ci / gnovm~14m → 6m18s;pkg/gnolangtest time −64%; VM heap allocations −84% on the heaviest suite.Summary
Profiling the gnovm test suites (dominated by interpreted Gno) showed ~50% of CPU in Go GC/malloc, with 67% of all heap allocations (694M objects in the bytes stdlib suite alone) coming from
ArrayValue.GetPointerAtIndexInt2materializing a heap*TypedValue+ boxedDataByteValueper byte accessed:copy()builtin allocated two boxes per byte copied (554M objects; the code carried aTODO: consider an optimization if dstv.Data != nil);for i, c := range bytesliceallocated three objects per iteration (the index TypedValue escaping throughGetPointerAtIndex(&iv), plus the view box) thatDerefimmediately discarded;b[i]reads indoOpIndex1did the same box-then-Deref dance.This adds
TypedValue.GetValueAtIntIndex— a read-only fast path mirroringGetPointerAtIndex's checks and panics for strings and Data-backed arrays/slices — used bydoOpIndex1and the range loop, and givescopy()direct byte copies when both sides are Data-backed (or the source is a string).Gas is unchanged: the view boxes were raw Go allocations never charged to the VM allocator, and CPU gas for
copy()was already charged before the per-element loops. Verified empirically: all 2344 filetest goldens byte-identical (includingGas:and MAXALLOC-sensitive alloc tests),gno.land/pkg/sdk/vmgas tests, txtar suite, examples, cmd/gno.Measurements
TestStdlibs/bytessoloBenchmarkOpIndex1_ByteArraypkg/gnolanglong mode, 16 coresUntouched paths benchmark flat (
OpRangeIter_100030.9µs → 30.2µs,OpIndex1_MapHit_100187.0ns → 187.7ns). Byte writes (b[i] = x) still box — the pointer protocol spans multiple ops; left as follow-up.