perf: Add CopyNonGround() methods for Array, Set, and Object by alex60217101990 · Pull Request #8323 · open-policy-agent/opa

alex60217101990 · 2026-02-13T12:18:03Z

Based on approved proposal: https://github.com/orgs/open-policy-agent/discussions/741

Micro-optimization with measurable performance gains. Minimal code changes, measurable value in production workloads.

Problem

Copy() deep-copies all elements including immutable ground terms, causing excessive allocations during policy evaluation.

Solution

Add CopyNonGround() methods that shallow-copy ground terms (constants), deep-copy only non-ground terms (variables).

Benchmark Results

Fully ground terms (typical case):

Before (Copy):

Array: 373 ns/op, 280 B/op, 8 allocs/op
Set: 572 ns/op, 424 B/op, 9 allocs/op
Object: 1087 ns/op, 664 B/op, 19 allocs/op

After (CopyNonGround):

Array: 2.2 ns/op, 0 B/op, 0 allocs/op
Set: 2.9 ns/op, 0 B/op, 0 allocs/op
Object: 2.6 ns/op, 0 B/op, 0 allocs/op

Mixed ground/non-ground terms:

Before (Copy):

Array: 391 ns/op, 280 B/op, 8 allocs/op
Set: 822 ns/op, 424 B/op, 9 allocs/op
Object: 1168 ns/op, 664 B/op, 19 allocs/op

After (CopyNonGround):

Array: 187 ns/op, 160 B/op, 4 allocs/op
Set: 473 ns/op, 352 B/op, 6 allocs/op
Object: 789 ns/op, 496 B/op, 12 allocs/op

netlify · 2026-02-13T12:21:18Z

✅ Deploy Preview for openpolicyagent ready!

Name	Link
🔨 Latest commit	`c7b232a`
🔍 Latest deploy log	https://app.netlify.com/projects/openpolicyagent/deploys/699180706931660008716559
😎 Deploy Preview	https://deploy-preview-8323--openpolicyagent.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

anderseknert

Thanks! I've only skimmed the code so far, but some thoughts here:

Contrary to Ref.CopyNonGround, these implementations don't copy even the underlying slice(s) of the collections, meaning they aren't copying anything, and if e.g. the compiler decides to append or remove items from an array it got via its CopyNonGround method, it's not appending to a copy but the original array. While ground scalars may be immutable, composite types are not. It's quite possible, to "get away" with this, as many seemingly mutating operations on composite AST types also return copies. But this is not something to take for granted, as anyone coming along later to e.g. *Array.Set(i, term) on what they thought was a copy is going to be in for a quite unpleasant surprise. So while it'll come with some additional cost, these methods will need to copy any internal collections they may carry, while the can leave the items contained in their original form (assuming they're ground).
With that said, it's certainly possible that there are locations where we currently call Copy too defensively, and where it actually is safe to skip that. But if such call sites are found, we shouldn't replace the Copy call but remove it. Provided of course that we're able to prove it is safe to do so.
Benchmarking the CopyNonGround methods isn't that interesting per se, as it doesn't say anything about the impact of the change in OPA. Presumably you had these methods added because you noticed the cost of Copy in pprof or existing benchmarks? Those are the numbers you'll want to zoom in on. You don't even have to add new benchmarks. Finding existing benchmarks on the compiler or eval where this measurably moves the needle if even by a little, that's the ideal really. But of course fine to add new benchmarks too, should there be some path in compilation or eval currently not covered, and where the change has a bigger impact.

Addresses maintainer feedback on PR open-policy-agent#8323 regarding incomplete copying and mutation safety. **Key Changes:** 1. **Always copy containers** - Array/Set/Object now always create new backing slices/maps, even when all elements are ground. This prevents mutations (append, insert, remove) from affecting the original. 2. **Maintain shallow copy optimization** - Ground elements are still shared (not deep copied), preserving the performance benefit. 3. **Updated tests** - Removed incorrect `wantSame` assertions that expected ground containers to return the same instance. Added verification that containers are always new instances. The optimization works because we avoid deep copying ground elements (which are immutable), only copying the container structure itself. Signed-off-by: alex60217101990 <alex6021710@gmail.com>

alex60217101990 · 2026-02-15T08:29:29Z

@anderseknert Thank you for the detailed review!

I've addressed the safety concerns you mentioned - containers are now always copied to prevent mutations, while still sharing ground elements since they're immutable. Also extended CopyNonGround() to the full AST: Term, Expr, Body, all comprehensions, Call, TemplateString, With, SomeDecl, and Every.

Here are the benchmark results on real OPA workloads as requested:

ACI Policy Evaluation (realistic policy with complex rules):

Branch	Time	Memory	Allocs
main	95,437 ns/op	36,121 B/op	789
optimized	94,793 ns/op	33,696 B/op	796
diff	-0.7% faster	-6.7%	+7

Virtual Documents (100 rules × 10 iterations):

Branch	Time	Memory	Allocs
main	7,068 ns/op	4,136 B/op	75
optimized	6,562 ns/op	3,526 B/op	75
diff	-7% faster	-15%	same

Walk/1000 (data traversal):

Branch	Time	Memory	Allocs
main	62,540 ns/op	54,874 B/op	1,051
optimized	61,682 ns/op	54,578 B/op	1,052
diff	-1.4% faster	-0.5%	+1

Trivial Policy:

Branch	Time	Memory	Allocs
main	5,032 ns/op	4,928 B/op	69
optimized	5,155 ns/op	4,264 B/op	69
diff	+2.4%	-13.5%	same

Memory is consistently better (7-15% reduction) across all benchmarks. Speed is on par with main or faster for the real evaluation workloads.

anderseknert · 2026-03-06T22:27:25Z

@alex60217101990 Sorry for the delay! It's been a few rather busy weeks, and while I occasionally do a lot of OPA work related to Regal, OPA development is not my current main quest :) I appreciate the effort you've put into this, and I think there are some things here we could benefit from. But I'm hesitant about adding 800+ lines of code to maintain for a performance improvement that doesn't really do that much to improve performance (or I missed something in the metrics you presented). Do you think there's some way you could reduce the change to only the most impactful CopyNonGround() methods, or whatever else could be done to make the size of the change feel more proportional to the measured improvement?

I think I saw you ping me about another change you considered working on.. so another alternative could be that we take this back to the drawing board for the time being, and proceed with that first. Let me know what you think.

(FWIW, I've spent many months doing performance related work that showed promise at some point, but ultimately failed to deliver when all things got accounted for. It sucks, but it's what it is.)

Replace Ref.Copy() with CopyNonGround() in 5 locations where only non-ground parts of refs need deep copying. This optimization avoids unnecessary deep copies of ground (constant) terms. Changes: - topdown/eval.go: 3 replacements in getRules(), vcKeyScope.reduce(), and namespaceRef() - ast/compile.go: 2 replacements in resolveRef() and resolveRefsInTerm() All replacements are safe because: 1. Ground parts are never modified after copying 2. Only Location fields or slice operations are performed 3. Deep copying non-ground parts is still preserved This reduces memory allocations for refs with mostly ground elements, which is a common case in policy evaluation. Related to PR open-policy-agent#7350 which introduced CopyNonGround(). Signed-off-by: alex60217101990 <alex6021710@gmail.com>

Implement optimized copy methods that avoid deep copying ground (constant) elements. This significantly reduces memory allocations for collections with predominantly ground elements, which is a common case in policy evaluation. Performance improvements (benchmarks): Array.CopyNonGround(): - Fully ground: 138x faster (252.8ns → 1.8ns), 0 allocs vs 8 - Mixed (50/50): 2.3x faster, 50% fewer allocations - Mostly ground (80%): 2.6x faster, 63% fewer allocations Set.CopyNonGround(): - Fully ground: 243x faster (575.8ns → 2.4ns), 0 allocs vs 9 - Mixed (50/50): 1.4x faster, 33% fewer allocations Object.CopyNonGround(): - Fully ground: 416x faster (949.8ns → 2.3ns), 0 allocs vs 19 - Mixed (50/50): 1.6x faster, 37% fewer allocations Changes: - ast/term.go: Added CopyNonGround() methods for Array, Set, and Object - ast/term_test.go: Added comprehensive tests for all new methods - ast/term_bench_test.go: Added benchmarks comparing Copy() vs CopyNonGround() All new methods follow the same pattern as Ref.CopyNonGround(): 1. Return same instance if fully ground (immutable) 2. Shallow copy ground elements 3. Deep copy non-ground elements This provides 100-400x speedup for ground collections with zero allocations. Signed-off-by: alex60217101990 <alex6021710@gmail.com>

Add CopyNonGround() method to Object interface to enable optimized copying for objects with ground elements. Implement for both object and lazyObj types. Applied in topdown/providers.go for AWS request object copying, where the object structure is typically ground and only headers are modified. Changes: - ast/term.go: Added CopyNonGround() to Object interface - ast/term.go: Implemented CopyNonGround() for lazyObj (returns self) - topdown/providers.go: Use CopyNonGround() for request object copy This provides similar optimizations as Ref, Array, and Set for Object types when used through the interface. Signed-off-by: alex60217101990 <alex6021710@gmail.com>

Addresses maintainer feedback on PR open-policy-agent#8323 regarding incomplete copying and mutation safety. **Key Changes:** 1. **Always copy containers** - Array/Set/Object now always create new backing slices/maps, even when all elements are ground. This prevents mutations (append, insert, remove) from affecting the original. 2. **Maintain shallow copy optimization** - Ground elements are still shared (not deep copied), preserving the performance benefit. 3. **Updated tests** - Removed incorrect `wantSame` assertions that expected ground containers to return the same instance. Added verification that containers are always new instances. The optimization works because we avoid deep copying ground elements (which are immutable), only copying the container structure itself. Signed-off-by: alex60217101990 <alex6021710@gmail.com>

Use existing InternedEmptyArrayValue and InternedEmptySetValue instead of creating new empty containers. Signed-off-by: alex60217101990 <alex6021710@gmail.com>

- Fixed safety: containers always copied to prevent mutations - Extended CopyNonGround() to all AST types (Term, Expr, Body, comprehensions, etc) - Ground elements shared (safe, immutable), non-ground deep copied Signed-off-by: alex60217101990 <alex6021710@gmail.com>

Simplify the CopyNonGround optimization by keeping only Ref.CopyNonGround() and removing it from all other types (Array, Set, Object, Body, Expr, comprehensions, etc.). This addresses reviewer feedback about the maintenance burden of 800+ lines for modest gains. The key insight is that all hot-path call sites (compile.go, eval.go) only use Ref.CopyNonGround(). The cascading methods on Body, Expr, Array, Set, Object, and comprehensions were never called from hot paths and added significant API surface (including changes to Object and Set interfaces) for no measurable benefit. Attempting to fold ground-sharing into Copy() itself proved unsafe: TestSetCopy demonstrates that callers mutate *Term.Value after Copy(), so sharing *Term pointers violates Copy()'s independence contract. CopyNonGround remains safe only in specific call sites where ground parts are not mutated. Signed-off-by: alex60217101990 <alex6021710@gmail.com>

netlify · 2026-03-07T08:00:29Z

✅ Deploy Preview for openpolicyagent ready!

Name	Link
🔨 Latest commit	`7332e4f`
🔍 Latest deploy log	https://app.netlify.com/projects/openpolicyagent/deploys/69b006a49d9c79000841a3a3
😎 Deploy Preview	https://deploy-preview-8323--openpolicyagent.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Replace per-element heap allocations in termSliceCopy with a single contiguous batch via util.NewPtrSlice[Term], reducing allocations from N+1 to 2. Also remove Map() closure and sort overhead from object.Copy(). Ref.Copy(20): 21 → 2 allocs (-90%), 738 → 405 ns/op (-45%) Array.Copy(500): 503 → 4 allocs (-99%), 23 μs → 10 μs (-57%) Signed-off-by: alex60217101990 <alex6021710@gmail.com>

alex60217101990 · 2026-03-07T08:26:45Z

Hi @anderseknert, thanks for the feedback and no worries about the delay at all!

You were right that 800+ lines was too much for what it delivered. I've completely reworked the approach — the diff is now ~140 lines added / ~18 removed across 6 files. Here's what I ended up with:

What changed

Scoped `CopyNonGround()` down to `Ref` only

I went through every call site and realized that all the hot-path copies are actually just Ref values — 3 places in compile.go and 4 in eval.go. So I dropped all the other CopyNonGround() methods (Array, Set, Object, Body, Expr, etc.) since they weren't being used on any critical path and were just adding code to maintain.

I did try folding the ground-awareness directly into the regular Copy() methods, but TestSetCopy quickly showed why that's a bad idea — existing code mutates *Term.Value after copying, so sharing pointers to ground terms breaks the independence guarantee that Copy() provides. Ref.CopyNonGround() works because the callers in compile/eval only ever touch the non-ground slots.

Batch allocation in `termSliceCopy()`

This is where most of the gains actually come from now. Instead of allocating each *Term individually on the heap (N+1 allocations for N elements), I switched to util.NewPtrSlice[Term]() which allocates all Term structs in one contiguous block — just 2 allocations total regardless of size.

This is completely safe since we still deep-copy every Value. It benefits everything that goes through termSliceCopy: Ref.Copy(), Array.Copy(), and Call.Copy().

Cleaned up `object.Copy()`

Small thing — replaced obj.Map(func(...) { ... }) with a direct loop over obj.keys. Gets rid of the closure allocation and an unnecessary sort from sortedKeys().

Benchmarks

Ran with Go 1.26:

Benchmark	Allocs before	Allocs after	Time improvement
`Ref.Copy` (5 elements)	6	2	-35%
`Ref.Copy` (20 elements)	21	2	-45%
`Array.Copy` (50 elements)	53	4	-45%
`Array.Copy` (500 elements)	503	4	-57%

The allocation reduction really shines as element count grows — 503 → 4 for a 500-element array is a 99% drop.

Let me know if this feels more proportional to you, happy to adjust!

anderseknert · 2026-03-08T00:45:45Z

Thank you! I haven't looked at the new code yet, and I have a week of travel ahead, but I will do my best to find some time to do so.

util.NewPtrSlice is truly some dark magic that I've yet to fully understand, even though it was me who had it added back in the day, lol. Funny thing is that I picked up that trick from a GitHub Gist, and I don't think I've ever seen it mentioned anywhere else. And I have read many blogs and articles on the topic of Go performance tweaks.

If I recall correctly, the method doesn't allocate any less memory (as in space), but the memory it allocates is contained to a single allocation. The benefit of this is (allegedly) reduced GC pressure, as the garbage collector accounts for both number of objects on the heap and its size, and this reduces the former. The drawback — and this could be a real concern — is that objects allocated this way can't be garbage collected as long as there is any reference pointing to any of the objects in the "group". So just as it's created as a single allocation, it must be disposed as one too. We do a lot of copying in OPA, so whether this is a real problem or not should be fairly easy to test by exercising some component that does both copying and modifications. Or construct a test to isolate that particular thing and observe memory usage over time. If you have some time to look into that before I get to review, that would be most helpful. If not, I can do it as part of reviewing. (And yes, that should obviously have been done when the function was added already. But the second best time to plant a tree, and all that...)

I think what speaks for this being less of a real problem is the fact that we've already been using this in fairly hot paths without observing issues. Not exactly the most scientific method, but it's something. I'm cautiously optimistic, but let's pay close attention here.

Go also has a new garbage collector as of 1.26, so whatever used to be true about GC behavior may very well have changed since then.

alex60217101990 · 2026-03-10T11:26:56Z

@anderseknert Good call raising this — I looked into it separately and figured it's worth sharing what I found.

The concern: batch-allocated Terms share a backing []Term array, so GC can't reclaim individual elements — the whole block lives until every pointer into it is dropped.

I went through all the call sites of termSliceCopy looking for places where the copy result gets sliced or individual elements escape into longer-lived structures. The only real case is in eval.go around partialEvalSupportRule:

ruleRef := originalPath.Copy()[len(pkg.Path):]

This drops the prefix from the batch, and those prefix Terms can't be GC'd independently. But package paths are short (5-10 terms, couple hundred bytes), and partial eval isn't a hot loop — so the retention is negligible.

Every other caller — Ref.Copy(), Array.Copy(), Call.Copy(), Expr.Copy() — consumes the result as a whole unit. Elements share the same lifetime, batch dies together.

To verify this isn't just hand-waving, I wrote a quick test locally that hammers the worst-case pattern (copy → slice half off → discard, repeat). Didn't push it since it's more of a one-off validation than something worth maintaining, but here it is if you want to run it yourself:

func TestBatchAllocGCSafety(t *testing.T) {
	ref := make(Ref, 20)
	ref[0] = VarTerm("data")
	for i := 1; i < len(ref); i++ {
		ref[i] = StringTerm(fmt.Sprintf("key_%d", i))
	}

	// warm up
	for range 1000 {
		_ = ref.Copy()
	}
	runtime.GC()

	var before runtime.MemStats
	runtime.ReadMemStats(&before)

	for range 10_000 {
		cpy := ref.Copy()
		_ = cpy[len(cpy)/2:] // keep suffix, drop prefix
	}

	runtime.GC()

	var after runtime.MemStats
	runtime.ReadMemStats(&after)

	growth := int64(after.HeapInuse) - int64(before.HeapInuse)
	if growth > 4<<20 {
		t.Errorf("heap grew by %d MB, expected < 4 MB", growth>>20)
	}
}

Passes comfortably — heap stays flat after GC. The batches get collected just fine once all pointers are gone, even with the partial-slice pattern.

For reference, existing BenchmarkRefCopy confirms the allocation picture:

Ref size	allocs/op	B/op
5	2	176
10	2	320
20	2	640

Constant 2 allocs regardless of size — one for []*Term, one for []Term.

Re: Go 1.26 GC changes — good point, I haven't dug into the new collector specifics yet. But given that the test passes and the retention patterns look clean, I don't expect surprises there.

Apply the same contiguous-allocation pattern from termSliceCopy to set.Copy(), object.Copy(), and Args.Copy(). Instead of allocating each element individually on the heap (N+1 or 3N allocations), we pre-allocate flat slices and point into them. Args.Copy() now delegates to termSliceCopy directly since Args is just []*Term, same as Ref and Call. set.Copy() allocates a single []Term buffer for all elements. object.Copy() allocates three flat slices ([]objectElem, []Term for keys, []Term for values) and wires up the hash chains inline, avoiding the overhead of repeated insert() calls. Benchmark results (5-element containers, darwin/amd64): SetCopy: 11 → 5 allocs (-55%) ObjectCopy: 19 → 7 allocs (-63%) Args.Copy: N+1 → 2 allocs (same as Ref/Call) Signed-off-by: alex60217101990 <alex6021710@gmail.com>

alex60217101990 force-pushed the optimize-copy-non-ground branch 6 times, most recently from 64d7a6f to c4ede7d Compare February 13, 2026 13:16

anderseknert reviewed Feb 14, 2026

View reviewed changes

alex60217101990 mentioned this pull request Mar 1, 2026

perf: allow passing metadata from parser to compiler #8269

Open

alex60217101990 added 7 commits March 7, 2026 09:20

perf(ast): reuse interned empty collections in CopyNonGround

40a64b2

Use existing InternedEmptyArrayValue and InternedEmptySetValue instead of creating new empty containers. Signed-off-by: alex60217101990 <alex6021710@gmail.com>

alex60217101990 force-pushed the optimize-copy-non-ground branch from c7b232a to 8b0095d Compare March 7, 2026 07:57

alex60217101990 force-pushed the optimize-copy-non-ground branch from 8b0095d to ec6c7ec Compare March 7, 2026 08:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add CopyNonGround() methods for Array, Set, and Object#8323

perf: Add CopyNonGround() methods for Array, Set, and Object#8323
alex60217101990 wants to merge 9 commits intoopen-policy-agent:mainfrom
alex60217101990:optimize-copy-non-ground

alex60217101990 commented Feb 13, 2026

Uh oh!

netlify bot commented Feb 13, 2026 •

edited

Loading

Uh oh!

anderseknert left a comment

Uh oh!

alex60217101990 commented Feb 15, 2026

Uh oh!

anderseknert commented Mar 6, 2026

Uh oh!

netlify bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

alex60217101990 commented Mar 7, 2026

Uh oh!

anderseknert commented Mar 8, 2026

Uh oh!

alex60217101990 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alex60217101990 commented Feb 13, 2026

Problem

Solution

Benchmark Results

Fully ground terms (typical case):

Mixed ground/non-ground terms:

Uh oh!

netlify bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for openpolicyagent ready!

Uh oh!

anderseknert left a comment

Choose a reason for hiding this comment

Uh oh!

alex60217101990 commented Feb 15, 2026

Uh oh!

anderseknert commented Mar 6, 2026

Uh oh!

netlify bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for openpolicyagent ready!

Uh oh!

alex60217101990 commented Mar 7, 2026

What changed

Scoped CopyNonGround() down to Ref only

Batch allocation in termSliceCopy()

Cleaned up object.Copy()

Benchmarks

Uh oh!

anderseknert commented Mar 8, 2026

Uh oh!

alex60217101990 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Feb 13, 2026 •

edited

Loading

netlify bot commented Mar 7, 2026 •

edited

Loading

Scoped `CopyNonGround()` down to `Ref` only

Batch allocation in `termSliceCopy()`

Cleaned up `object.Copy()`