Skip to content

perf(gnovm): reduce per-call and per-op allocations#5816

Draft
thehowl wants to merge 3 commits into
dev/morgan/gnovm-ci-no-coveragefrom
dev/morgan/gnovm-alloc-fixes
Draft

perf(gnovm): reduce per-call and per-op allocations#5816
thehowl wants to merge 3 commits into
dev/morgan/gnovm-ci-no-coveragefrom
dev/morgan/gnovm-alloc-fixes

Conversation

@thehowl

@thehowl thehowl commented Jun 11, 2026

Copy link
Copy Markdown
Member

Note

Part 6 of 6 of the gnovm performance stack (split from #5800), to be merged in order:

  1. perf(gnovm): parallelize test suites and add gno test -jobs #5811 — perf(gnovm): parallelize test suites and add gno test -jobs
  2. perf(gnovm): avoid heap-boxed byte access in copy, range and index reads #5812 — perf(gnovm): avoid heap-boxed byte access in copy, range and index reads
  3. perf(gnovm): recycle runtime blocks through a per-machine pool #5813 — perf(gnovm): recycle runtime blocks through a per-machine pool
  4. perf(gnovm): share interface-held values when copying arrays #5814 — perf(gnovm): share interface-held values when copying arrays (gas-visible)
  5. ci(gnovm): skip print-only coverage instrumentation #5815 — ci(gnovm): skip print-only coverage instrumentation
  6. perf(gnovm): reduce per-call and per-op allocations #5816 — perf(gnovm): reduce per-call and per-op allocations

Each PR is based on the previous one's branch; this one diffs against part 5. Together: ci / gnovm ~14m → 6m18s; pkg/gnolang test time −64%; VM heap allocations −84% on the heaviest suite.

Summary

The next allocation tier after block pooling, found by re-profiling the bytes stdlib suite (~135M heap objects at that point). Three commits, all gas-neutral (no allocator charges added or removed; all filetest goldens byte-identical):

  • popCopyArgs reuses a per-machine scratch buffer on the doOpCall path — the args are copied into the call block immediately, so the slice never outlives call setup. doOpDefer's args escape into the Defer and keep allocating fresh slices. (30M objects)
  • BigintValue/BigdecValue.Copy share the underlying value — they are immutable at runtime (all arithmetic writes into fresh receivers, conversions only read), and neither Copy ever charged the allocator. Mostly untyped-const operands copied at declaration sites. (24M)
  • Per-op heap escapes and type re-boxing: doOpValueDecl's working TypedValue escaped once per declaration via ConvertUntypedTo(&tv) — at runtime only untyped bools (from comparisons) reach that path, now retyped directly (16.6M); doOpConvert escaped the same way via ConvertTo/IsReadonly and now uses a machine-owned scratch slot (11.7M); evaluating a constTypeExpr re-boxed the type into a TypeValue interface per evaluation — every conversion evaluates one — and the boxed form is now cached on the node at preprocess time, with a read-only fallback for store-loaded nodes (no lazy fill: nodes can be shared across machines). (13.5M)

Measurements

Same-session A/B (back-to-back runs; cross-session laptop numbers drift thermally): full pkg/gnolang long mode 173.9s → 157.1s (−10%), bytes suite solo 94.9s → 83.5s.

With the whole stack applied, CI lands at: pkg/gnolang 256.1s (708–722s on master, −64%), main / test 6m00s (13m43s), stdlibs / test 3m54s (8m33s), whole ci / gnovm workflow 6m18s (~14m on master) — single good-runner sample; expect ~6–8m across runner quality.

Verification: filetest suite canonical (all goldens byte-identical), vm gas tests, txtar, 220 example packages, cmd/gno.

thehowl added 3 commits June 11, 2026 18:31
popCopyArgs allocated an args slice per function call (30M objects in
the bytes stdlib suite, the top allocation site after block pooling).
doOpCall consumes the args immediately — they are copied into the call
block before any further ops — so it now hands popCopyArgs a reusable
per-machine scratch buffer, cleared after each use. doOpDefer's args
escape into the Defer and keep allocating fresh slices.
BigintValue and BigdecValue are immutable at runtime: all arithmetic
writes into fresh receivers and conversions only read. Copying them
allocated a fresh big.Int/apd.Decimal per copy — 24M allocations in the
bytes stdlib suite, mostly from untyped-const operands copied at
declaration sites. Share the underlying value instead. Neither Copy
ever charged the allocator, so this is gas-neutral.
Three more allocation sites found by profiling the bytes stdlib suite,
together ~42M objects (of 135M total):

- doOpValueDecl let its working TypedValue escape to the heap once per
  declaration executed, because its address went into ConvertUntypedTo.
  At runtime only untyped bools (from comparisons) reach that path, so
  retype directly; the preprocess-stage conversion moves to a by-value
  helper. (16.6M)

- doOpConvert's working value escaped the same way via ConvertTo and
  IsReadonly. Use a machine-owned scratch slot: a field's address is
  free, and the op is single-threaded and self-contained. (11.7M)

- Evaluating a constTypeExpr re-boxed the type into a TypeValue
  interface per evaluation (every conversion evaluates one). Cache the
  boxed form on the node at preprocess time; nodes loaded from the
  store fall back to boxing per eval (the cache is not persisted and is
  never lazily filled at runtime, since nodes can be shared across
  machines). (13.5M)

Also documents on Machine.Release why blockPool/callArgsScratch are
deliberately not carried through the machine pool: measured to hurt
parallel workloads via extra live heap and lost cache locality, without
helping machine-churn workloads (sync.Pool eviction discards them).

Full verification battery: filetest suite canonical (all goldens
byte-identical), vm Gas tests, txtar, 220 example packages, cmd/gno.
Same-session A/B: full pkg/gnolang long mode 173.9s -> 157.1s (-10%);
bytes suite solo 94.9s -> 83.5s.
@Gno2D2

Gno2D2 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

No automated checks match this pull request.

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📦 🤖 gnovm Issues or PRs gnovm related

Projects

Development

Successfully merging this pull request may close these issues.

2 participants