Skip to content

perf(gnovm): recycle runtime blocks through a per-machine pool#5813

Draft
thehowl wants to merge 1 commit into
dev/morgan/gnovm-byte-fastpathsfrom
dev/morgan/gnovm-block-pool
Draft

perf(gnovm): recycle runtime blocks through a per-machine pool#5813
thehowl wants to merge 1 commit into
dev/morgan/gnovm-byte-fastpathsfrom
dev/morgan/gnovm-block-pool

Conversation

@thehowl

@thehowl thehowl commented Jun 11, 2026

Copy link
Copy Markdown
Member

Note

Part 3 of 6 of the gnovm performance stack (split from #5800), to be merged in order:

  1. perf(gnovm): parallelize test suites and add gno test -jobs #5811 — perf(gnovm): parallelize test suites and add gno test -jobs
  2. perf(gnovm): avoid heap-boxed byte access in copy, range and index reads #5812 — perf(gnovm): avoid heap-boxed byte access in copy, range and index reads
  3. perf(gnovm): recycle runtime blocks through a per-machine pool #5813 — perf(gnovm): recycle runtime blocks through a per-machine pool
  4. perf(gnovm): share interface-held values when copying arrays #5814 — perf(gnovm): share interface-held values when copying arrays (gas-visible)
  5. ci(gnovm): skip print-only coverage instrumentation #5815 — ci(gnovm): skip print-only coverage instrumentation
  6. perf(gnovm): reduce per-call and per-op allocations #5816 — perf(gnovm): reduce per-call and per-op allocations

Each PR is based on the previous one's branch; this one diffs against part 2. Together: ci / gnovm ~14m → 6m18s; pkg/gnolang test time −64%; VM heap allocations −84% on the heaviest suite.

Summary

After the byte-access fixes, NewBlock was the dominant allocation site: 45% of remaining heap objects (135M in the bytes suite — 75M per-scope blocks from doOpExec for if/for/range/switch/block, 60M call blocks from doOpCall, each also allocating a Values slice).

The enabler is an invariant the heap-items design already established: closures capture HeapItemValues rather than blocks (doOpFuncLit sets Parent: nil and copies Captures), &local is always heap-promoted by the preprocessor, and frames/stacktraces store only indices and locations — so a runtime block provably dies when discarded from the machine's block stack. acquireBlock/releaseBlock implement a small per-machine pool on that invariant, routed through every block discard site (OpPopBlock, GotoJump, PopFrameAndReset/Return, PeekFrameAndContinueFor/Range). Released blocks are zeroed so they retain no references.

releaseBlock excludes the three block populations that also travel the stack but must never be pooled:

  • node-owned static blocks (Eval/RunStatement flows push them) — identified by static-block identity (b.Source.GetStaticBlock().GetBlock() == b);
  • file/package blocks (referenced by FuncValue.Parent) — identified by Source node type;
  • defer-site blocksDefer.Parent is visited by the VM's GC until the defer runs, so doOpDefer marks them via a flag stored in bodyStmt's trailing padding, keeping unsafe.Sizeof(Block{}) — and with it the _allocBlock gas constant — unchanged (the alloc-constants init assert enforces this).

Panic unwinding skips pooling entirely as cheap conservatism. Gas and VM-GC accounting are unchanged: acquireBlock charges AllocateBlock exactly like Allocator.NewBlock, and pooled blocks are unreachable from GC roots just like dead blocks today.

One negative result is documented in Machine.Release: carrying the pool through the cross-goroutine machinePool measured 10–25% slower on parallel workloads (extra live heap across GC cycles, lost cache locality) without helping machine-churn workloads, so the pool is deliberately per-machine-lifetime only.

Measurements

before after
heap objects (bytes suite) 300M 165M
TestStdlibs/bytes solo 105.2s 94.9s
full pkg/gnolang long mode, 16 cores 184.6s 154.0s
4-core + coverage CI simulation 509.7s 424.9s (600.0s on master)

Verification: all 2344 filetest goldens byte-identical (incl. Gas:/Realm:/Storage:/MAXALLOC tests), vm gas tests, txtar, 220 example packages, cmd/gno, and the closure/defer/recover/heap/goto escape-pattern filetests specifically.

After the byte-access fixes, NewBlock was the dominant allocation site
(45% of remaining heap objects in the bytes suite: 75M per-scope blocks
from doOpExec — if/for/range/switch/block — and 60M call blocks from
doOpCall, each also allocating a Values slice).

With closures capturing heap items rather than blocks (doOpFuncLit sets
Parent=nil and copies Captures), a runtime block provably dies when it
is discarded from the machine's block stack. acquireBlock/releaseBlock
implement a small per-machine pool on top of that invariant; all block
discard sites (OpPopBlock, GotoJump, PopFrameAndReset/Return,
PeekFrameAndContinueFor/Range) route through it. Blocks are zeroed on
release so they retain no references.

Skipped from pooling, in releaseBlock:
- node-owned static blocks and file/package blocks, which also travel
  the block stack (Eval/RunStatement flows push static blocks; file
  blocks are referenced by FuncValue.Parent) — identified by Source
  type and static-block identity;
- defer-site blocks: Defer.Parent is visited by the garbage collector
  until the defer runs, so doOpDefer marks them via a flag stored in
  bodyStmt's trailing padding, keeping unsafe.Sizeof(Block{}) — and the
  _allocBlock gas constant — unchanged;
- anything while a panic is unwinding, as cheap conservatism.

Gas and VM-GC accounting are unchanged: acquireBlock charges
AllocateBlock exactly like Allocator.NewBlock, and pooled blocks are
unreachable from GC roots just like dead blocks today. Verified: all
2344 filetest goldens (Gas:, Realm:, Storage:, MAXALLOC alloc tests)
byte-identical, vm Gas tests, txtar suite, examples (220 packages),
cmd/gno suite.

bytes suite heap objects: 165M (from 300M; 1.03G before the byte-access
fixes); bytes suite solo: 105.2s -> 94.9s; full pkg/gnolang long mode:
184.6s -> 154.0s; 4-core+coverage CI simulation: 509.7s -> 424.9s
(600.0s on master).
@Gno2D2

Gno2D2 commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

No automated checks match this pull request.

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📦 🤖 gnovm Issues or PRs gnovm related

Projects

Development

Successfully merging this pull request may close these issues.

2 participants