perf(gnovm): reduce per-call and per-op allocations#5816
Draft
thehowl wants to merge 3 commits into
Draft
Conversation
popCopyArgs allocated an args slice per function call (30M objects in the bytes stdlib suite, the top allocation site after block pooling). doOpCall consumes the args immediately — they are copied into the call block before any further ops — so it now hands popCopyArgs a reusable per-machine scratch buffer, cleared after each use. doOpDefer's args escape into the Defer and keep allocating fresh slices.
BigintValue and BigdecValue are immutable at runtime: all arithmetic writes into fresh receivers and conversions only read. Copying them allocated a fresh big.Int/apd.Decimal per copy — 24M allocations in the bytes stdlib suite, mostly from untyped-const operands copied at declaration sites. Share the underlying value instead. Neither Copy ever charged the allocator, so this is gas-neutral.
Three more allocation sites found by profiling the bytes stdlib suite, together ~42M objects (of 135M total): - doOpValueDecl let its working TypedValue escape to the heap once per declaration executed, because its address went into ConvertUntypedTo. At runtime only untyped bools (from comparisons) reach that path, so retype directly; the preprocess-stage conversion moves to a by-value helper. (16.6M) - doOpConvert's working value escaped the same way via ConvertTo and IsReadonly. Use a machine-owned scratch slot: a field's address is free, and the op is single-threaded and self-contained. (11.7M) - Evaluating a constTypeExpr re-boxed the type into a TypeValue interface per evaluation (every conversion evaluates one). Cache the boxed form on the node at preprocess time; nodes loaded from the store fall back to boxing per eval (the cache is not persisted and is never lazily filled at runtime, since nodes can be shared across machines). (13.5M) Also documents on Machine.Release why blockPool/callArgsScratch are deliberately not carried through the machine pool: measured to hurt parallel workloads via extra live heap and lost cache locality, without helping machine-churn workloads (sync.Pool eviction discards them). Full verification battery: filetest suite canonical (all goldens byte-identical), vm Gas tests, txtar, 220 example packages, cmd/gno. Same-session A/B: full pkg/gnolang long mode 173.9s -> 157.1s (-10%); bytes suite solo 94.9s -> 83.5s.
This was referenced Jun 11, 2026
Collaborator
🛠 PR Checks SummaryAll Automated Checks passed. ✅ Manual Checks (for Reviewers):
Read More🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers. ✅ Automated Checks (for Contributors):No automated checks match this pull request. ☑️ Contributor Actions:
☑️ Reviewer Actions:
📚 Resources:Debug
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Part 6 of 6 of the gnovm performance stack (split from #5800), to be merged in order:
Each PR is based on the previous one's branch; this one diffs against part 5. Together:
ci / gnovm~14m → 6m18s;pkg/gnolangtest time −64%; VM heap allocations −84% on the heaviest suite.Summary
The next allocation tier after block pooling, found by re-profiling the bytes stdlib suite (~135M heap objects at that point). Three commits, all gas-neutral (no allocator charges added or removed; all filetest goldens byte-identical):
popCopyArgsreuses a per-machine scratch buffer on thedoOpCallpath — the args are copied into the call block immediately, so the slice never outlives call setup.doOpDefer's args escape into theDeferand keep allocating fresh slices. (30M objects)BigintValue/BigdecValue.Copyshare the underlying value — they are immutable at runtime (all arithmetic writes into fresh receivers, conversions only read), and neitherCopyever charged the allocator. Mostly untyped-const operands copied at declaration sites. (24M)doOpValueDecl's working TypedValue escaped once per declaration viaConvertUntypedTo(&tv)— at runtime only untyped bools (from comparisons) reach that path, now retyped directly (16.6M);doOpConvertescaped the same way viaConvertTo/IsReadonlyand now uses a machine-owned scratch slot (11.7M); evaluating aconstTypeExprre-boxed the type into aTypeValueinterface per evaluation — every conversion evaluates one — and the boxed form is now cached on the node at preprocess time, with a read-only fallback for store-loaded nodes (no lazy fill: nodes can be shared across machines). (13.5M)Measurements
Same-session A/B (back-to-back runs; cross-session laptop numbers drift thermally): full
pkg/gnolanglong mode 173.9s → 157.1s (−10%), bytes suite solo 94.9s → 83.5s.With the whole stack applied, CI lands at:
pkg/gnolang256.1s (708–722s on master, −64%),main / test6m00s (13m43s),stdlibs / test3m54s (8m33s), wholeci / gnovmworkflow 6m18s (~14m on master) — single good-runner sample; expect ~6–8m across runner quality.Verification: filetest suite canonical (all goldens byte-identical), vm gas tests, txtar, 220 example packages, cmd/gno.