perf(gnovm): recycle runtime blocks through a per-machine pool#5813
Draft
thehowl wants to merge 1 commit into
Draft
perf(gnovm): recycle runtime blocks through a per-machine pool#5813thehowl wants to merge 1 commit into
thehowl wants to merge 1 commit into
Conversation
After the byte-access fixes, NewBlock was the dominant allocation site
(45% of remaining heap objects in the bytes suite: 75M per-scope blocks
from doOpExec — if/for/range/switch/block — and 60M call blocks from
doOpCall, each also allocating a Values slice).
With closures capturing heap items rather than blocks (doOpFuncLit sets
Parent=nil and copies Captures), a runtime block provably dies when it
is discarded from the machine's block stack. acquireBlock/releaseBlock
implement a small per-machine pool on top of that invariant; all block
discard sites (OpPopBlock, GotoJump, PopFrameAndReset/Return,
PeekFrameAndContinueFor/Range) route through it. Blocks are zeroed on
release so they retain no references.
Skipped from pooling, in releaseBlock:
- node-owned static blocks and file/package blocks, which also travel
the block stack (Eval/RunStatement flows push static blocks; file
blocks are referenced by FuncValue.Parent) — identified by Source
type and static-block identity;
- defer-site blocks: Defer.Parent is visited by the garbage collector
until the defer runs, so doOpDefer marks them via a flag stored in
bodyStmt's trailing padding, keeping unsafe.Sizeof(Block{}) — and the
_allocBlock gas constant — unchanged;
- anything while a panic is unwinding, as cheap conservatism.
Gas and VM-GC accounting are unchanged: acquireBlock charges
AllocateBlock exactly like Allocator.NewBlock, and pooled blocks are
unreachable from GC roots just like dead blocks today. Verified: all
2344 filetest goldens (Gas:, Realm:, Storage:, MAXALLOC alloc tests)
byte-identical, vm Gas tests, txtar suite, examples (220 packages),
cmd/gno suite.
bytes suite heap objects: 165M (from 300M; 1.03G before the byte-access
fixes); bytes suite solo: 105.2s -> 94.9s; full pkg/gnolang long mode:
184.6s -> 154.0s; 4-core+coverage CI simulation: 509.7s -> 424.9s
(600.0s on master).
This was referenced Jun 11, 2026
Collaborator
🛠 PR Checks SummaryAll Automated Checks passed. ✅ Manual Checks (for Reviewers):
Read More🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers. ✅ Automated Checks (for Contributors):No automated checks match this pull request. ☑️ Contributor Actions:
☑️ Reviewer Actions:
📚 Resources:Debug
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Part 3 of 6 of the gnovm performance stack (split from #5800), to be merged in order:
Each PR is based on the previous one's branch; this one diffs against part 2. Together:
ci / gnovm~14m → 6m18s;pkg/gnolangtest time −64%; VM heap allocations −84% on the heaviest suite.Summary
After the byte-access fixes,
NewBlockwas the dominant allocation site: 45% of remaining heap objects (135M in the bytes suite — 75M per-scope blocks fromdoOpExecfor if/for/range/switch/block, 60M call blocks fromdoOpCall, each also allocating aValuesslice).The enabler is an invariant the heap-items design already established: closures capture
HeapItemValues rather than blocks (doOpFuncLitsetsParent: niland copiesCaptures),&localis always heap-promoted by the preprocessor, and frames/stacktraces store only indices and locations — so a runtime block provably dies when discarded from the machine's block stack.acquireBlock/releaseBlockimplement a small per-machine pool on that invariant, routed through every block discard site (OpPopBlock,GotoJump,PopFrameAndReset/Return,PeekFrameAndContinueFor/Range). Released blocks are zeroed so they retain no references.releaseBlockexcludes the three block populations that also travel the stack but must never be pooled:b.Source.GetStaticBlock().GetBlock() == b);FuncValue.Parent) — identified by Source node type;Defer.Parentis visited by the VM's GC until the defer runs, sodoOpDefermarks them via a flag stored inbodyStmt's trailing padding, keepingunsafe.Sizeof(Block{})— and with it the_allocBlockgas constant — unchanged (the alloc-constants init assert enforces this).Panic unwinding skips pooling entirely as cheap conservatism. Gas and VM-GC accounting are unchanged:
acquireBlockchargesAllocateBlockexactly likeAllocator.NewBlock, and pooled blocks are unreachable from GC roots just like dead blocks today.One negative result is documented in
Machine.Release: carrying the pool through the cross-goroutinemachinePoolmeasured 10–25% slower on parallel workloads (extra live heap across GC cycles, lost cache locality) without helping machine-churn workloads, so the pool is deliberately per-machine-lifetime only.Measurements
TestStdlibs/bytessolopkg/gnolanglong mode, 16 coresVerification: all 2344 filetest goldens byte-identical (incl.
Gas:/Realm:/Storage:/MAXALLOC tests), vm gas tests, txtar, 220 example packages, cmd/gno, and the closure/defer/recover/heap/goto escape-pattern filetests specifically.