perf: Add CopyNonGround() methods for Array, Set, and Object#8323
perf: Add CopyNonGround() methods for Array, Set, and Object#8323alex60217101990 wants to merge 9 commits intoopen-policy-agent:mainfrom
Conversation
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
64d7a6f to
c4ede7d
Compare
anderseknert
left a comment
There was a problem hiding this comment.
Thanks! I've only skimmed the code so far, but some thoughts here:
-
Contrary to
Ref.CopyNonGround, these implementations don't copy even the underlying slice(s) of the collections, meaning they aren't copying anything, and if e.g. the compiler decides to append or remove items from an array it got via itsCopyNonGroundmethod, it's not appending to a copy but the original array. While ground scalars may be immutable, composite types are not. It's quite possible, to "get away" with this, as many seemingly mutating operations on composite AST types also return copies. But this is not something to take for granted, as anyone coming along later to e.g.*Array.Set(i, term)on what they thought was a copy is going to be in for a quite unpleasant surprise. So while it'll come with some additional cost, these methods will need to copy any internal collections they may carry, while the can leave the items contained in their original form (assuming they're ground). -
With that said, it's certainly possible that there are locations where we currently call
Copytoo defensively, and where it actually is safe to skip that. But if such call sites are found, we shouldn't replace theCopycall but remove it. Provided of course that we're able to prove it is safe to do so. -
Benchmarking the
CopyNonGroundmethods isn't that interesting per se, as it doesn't say anything about the impact of the change in OPA. Presumably you had these methods added because you noticed the cost ofCopyin pprof or existing benchmarks? Those are the numbers you'll want to zoom in on. You don't even have to add new benchmarks. Finding existing benchmarks on the compiler or eval where this measurably moves the needle if even by a little, that's the ideal really. But of course fine to add new benchmarks too, should there be some path in compilation or eval currently not covered, and where the change has a bigger impact.
Addresses maintainer feedback on PR open-policy-agent#8323 regarding incomplete copying and mutation safety. **Key Changes:** 1. **Always copy containers** - Array/Set/Object now always create new backing slices/maps, even when all elements are ground. This prevents mutations (append, insert, remove) from affecting the original. 2. **Maintain shallow copy optimization** - Ground elements are still shared (not deep copied), preserving the performance benefit. 3. **Updated tests** - Removed incorrect `wantSame` assertions that expected ground containers to return the same instance. Added verification that containers are always new instances. The optimization works because we avoid deep copying ground elements (which are immutable), only copying the container structure itself. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
|
@anderseknert Thank you for the detailed review! I've addressed the safety concerns you mentioned - containers are now always copied to prevent mutations, while still sharing ground elements since they're immutable. Also extended CopyNonGround() to the full AST: Term, Expr, Body, all comprehensions, Call, TemplateString, With, SomeDecl, and Every. Here are the benchmark results on real OPA workloads as requested: ACI Policy Evaluation (realistic policy with complex rules):
Virtual Documents (100 rules × 10 iterations):
Walk/1000 (data traversal):
Trivial Policy:
Memory is consistently better (7-15% reduction) across all benchmarks. Speed is on par with main or faster for the real evaluation workloads. |
|
@alex60217101990 Sorry for the delay! It's been a few rather busy weeks, and while I occasionally do a lot of OPA work related to Regal, OPA development is not my current main quest :) I appreciate the effort you've put into this, and I think there are some things here we could benefit from. But I'm hesitant about adding 800+ lines of code to maintain for a performance improvement that doesn't really do that much to improve performance (or I missed something in the metrics you presented). Do you think there's some way you could reduce the change to only the most impactful I think I saw you ping me about another change you considered working on.. so another alternative could be that we take this back to the drawing board for the time being, and proceed with that first. Let me know what you think. (FWIW, I've spent many months doing performance related work that showed promise at some point, but ultimately failed to deliver when all things got accounted for. It sucks, but it's what it is.) |
Replace Ref.Copy() with CopyNonGround() in 5 locations where only non-ground parts of refs need deep copying. This optimization avoids unnecessary deep copies of ground (constant) terms. Changes: - topdown/eval.go: 3 replacements in getRules(), vcKeyScope.reduce(), and namespaceRef() - ast/compile.go: 2 replacements in resolveRef() and resolveRefsInTerm() All replacements are safe because: 1. Ground parts are never modified after copying 2. Only Location fields or slice operations are performed 3. Deep copying non-ground parts is still preserved This reduces memory allocations for refs with mostly ground elements, which is a common case in policy evaluation. Related to PR open-policy-agent#7350 which introduced CopyNonGround(). Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Implement optimized copy methods that avoid deep copying ground (constant) elements. This significantly reduces memory allocations for collections with predominantly ground elements, which is a common case in policy evaluation. Performance improvements (benchmarks): Array.CopyNonGround(): - Fully ground: 138x faster (252.8ns → 1.8ns), 0 allocs vs 8 - Mixed (50/50): 2.3x faster, 50% fewer allocations - Mostly ground (80%): 2.6x faster, 63% fewer allocations Set.CopyNonGround(): - Fully ground: 243x faster (575.8ns → 2.4ns), 0 allocs vs 9 - Mixed (50/50): 1.4x faster, 33% fewer allocations Object.CopyNonGround(): - Fully ground: 416x faster (949.8ns → 2.3ns), 0 allocs vs 19 - Mixed (50/50): 1.6x faster, 37% fewer allocations Changes: - ast/term.go: Added CopyNonGround() methods for Array, Set, and Object - ast/term_test.go: Added comprehensive tests for all new methods - ast/term_bench_test.go: Added benchmarks comparing Copy() vs CopyNonGround() All new methods follow the same pattern as Ref.CopyNonGround(): 1. Return same instance if fully ground (immutable) 2. Shallow copy ground elements 3. Deep copy non-ground elements This provides 100-400x speedup for ground collections with zero allocations. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Add CopyNonGround() method to Object interface to enable optimized copying for objects with ground elements. Implement for both object and lazyObj types. Applied in topdown/providers.go for AWS request object copying, where the object structure is typically ground and only headers are modified. Changes: - ast/term.go: Added CopyNonGround() to Object interface - ast/term.go: Implemented CopyNonGround() for lazyObj (returns self) - topdown/providers.go: Use CopyNonGround() for request object copy This provides similar optimizations as Ref, Array, and Set for Object types when used through the interface. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Addresses maintainer feedback on PR open-policy-agent#8323 regarding incomplete copying and mutation safety. **Key Changes:** 1. **Always copy containers** - Array/Set/Object now always create new backing slices/maps, even when all elements are ground. This prevents mutations (append, insert, remove) from affecting the original. 2. **Maintain shallow copy optimization** - Ground elements are still shared (not deep copied), preserving the performance benefit. 3. **Updated tests** - Removed incorrect `wantSame` assertions that expected ground containers to return the same instance. Added verification that containers are always new instances. The optimization works because we avoid deep copying ground elements (which are immutable), only copying the container structure itself. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Use existing InternedEmptyArrayValue and InternedEmptySetValue instead of creating new empty containers. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
- Fixed safety: containers always copied to prevent mutations - Extended CopyNonGround() to all AST types (Term, Expr, Body, comprehensions, etc) - Ground elements shared (safe, immutable), non-ground deep copied Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Simplify the CopyNonGround optimization by keeping only Ref.CopyNonGround() and removing it from all other types (Array, Set, Object, Body, Expr, comprehensions, etc.). This addresses reviewer feedback about the maintenance burden of 800+ lines for modest gains. The key insight is that all hot-path call sites (compile.go, eval.go) only use Ref.CopyNonGround(). The cascading methods on Body, Expr, Array, Set, Object, and comprehensions were never called from hot paths and added significant API surface (including changes to Object and Set interfaces) for no measurable benefit. Attempting to fold ground-sharing into Copy() itself proved unsafe: TestSetCopy demonstrates that callers mutate *Term.Value after Copy(), so sharing *Term pointers violates Copy()'s independence contract. CopyNonGround remains safe only in specific call sites where ground parts are not mutated. Signed-off-by: alex60217101990 <alex6021710@gmail.com>
c7b232a to
8b0095d
Compare
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Replace per-element heap allocations in termSliceCopy with a single contiguous batch via util.NewPtrSlice[Term], reducing allocations from N+1 to 2. Also remove Map() closure and sort overhead from object.Copy(). Ref.Copy(20): 21 → 2 allocs (-90%), 738 → 405 ns/op (-45%) Array.Copy(500): 503 → 4 allocs (-99%), 23 μs → 10 μs (-57%) Signed-off-by: alex60217101990 <alex6021710@gmail.com>
8b0095d to
ec6c7ec
Compare
|
Hi @anderseknert, thanks for the feedback and no worries about the delay at all! You were right that 800+ lines was too much for what it delivered. I've completely reworked the approach — the diff is now ~140 lines added / ~18 removed across 6 files. Here's what I ended up with: What changedScoped
|
| Benchmark | Allocs before | Allocs after | Time improvement |
|---|---|---|---|
Ref.Copy (5 elements) |
6 | 2 | -35% |
Ref.Copy (20 elements) |
21 | 2 | -45% |
Array.Copy (50 elements) |
53 | 4 | -45% |
Array.Copy (500 elements) |
503 | 4 | -57% |
The allocation reduction really shines as element count grows — 503 → 4 for a 500-element array is a 99% drop.
Let me know if this feels more proportional to you, happy to adjust!
|
Thank you! I haven't looked at the new code yet, and I have a week of travel ahead, but I will do my best to find some time to do so.
If I recall correctly, the method doesn't allocate any less memory (as in space), but the memory it allocates is contained to a single allocation. The benefit of this is (allegedly) reduced GC pressure, as the garbage collector accounts for both number of objects on the heap and its size, and this reduces the former. The drawback — and this could be a real concern — is that objects allocated this way can't be garbage collected as long as there is any reference pointing to any of the objects in the "group". So just as it's created as a single allocation, it must be disposed as one too. We do a lot of copying in OPA, so whether this is a real problem or not should be fairly easy to test by exercising some component that does both copying and modifications. Or construct a test to isolate that particular thing and observe memory usage over time. If you have some time to look into that before I get to review, that would be most helpful. If not, I can do it as part of reviewing. (And yes, that should obviously have been done when the function was added already. But the second best time to plant a tree, and all that...) I think what speaks for this being less of a real problem is the fact that we've already been using this in fairly hot paths without observing issues. Not exactly the most scientific method, but it's something. I'm cautiously optimistic, but let's pay close attention here. Go also has a new garbage collector as of 1.26, so whatever used to be true about GC behavior may very well have changed since then. |
|
@anderseknert Good call raising this — I looked into it separately and figured it's worth sharing what I found. The concern: batch-allocated Terms share a backing I went through all the call sites of ruleRef := originalPath.Copy()[len(pkg.Path):]This drops the prefix from the batch, and those prefix Terms can't be GC'd independently. But package paths are short (5-10 terms, couple hundred bytes), and partial eval isn't a hot loop — so the retention is negligible. Every other caller — To verify this isn't just hand-waving, I wrote a quick test locally that hammers the worst-case pattern (copy → slice half off → discard, repeat). Didn't push it since it's more of a one-off validation than something worth maintaining, but here it is if you want to run it yourself: func TestBatchAllocGCSafety(t *testing.T) {
ref := make(Ref, 20)
ref[0] = VarTerm("data")
for i := 1; i < len(ref); i++ {
ref[i] = StringTerm(fmt.Sprintf("key_%d", i))
}
// warm up
for range 1000 {
_ = ref.Copy()
}
runtime.GC()
var before runtime.MemStats
runtime.ReadMemStats(&before)
for range 10_000 {
cpy := ref.Copy()
_ = cpy[len(cpy)/2:] // keep suffix, drop prefix
}
runtime.GC()
var after runtime.MemStats
runtime.ReadMemStats(&after)
growth := int64(after.HeapInuse) - int64(before.HeapInuse)
if growth > 4<<20 {
t.Errorf("heap grew by %d MB, expected < 4 MB", growth>>20)
}
}Passes comfortably — heap stays flat after GC. The batches get collected just fine once all pointers are gone, even with the partial-slice pattern. For reference, existing
Constant 2 allocs regardless of size — one for Re: Go 1.26 GC changes — good point, I haven't dug into the new collector specifics yet. But given that the test passes and the retention patterns look clean, I don't expect surprises there. |
Apply the same contiguous-allocation pattern from termSliceCopy to set.Copy(), object.Copy(), and Args.Copy(). Instead of allocating each element individually on the heap (N+1 or 3N allocations), we pre-allocate flat slices and point into them. Args.Copy() now delegates to termSliceCopy directly since Args is just []*Term, same as Ref and Call. set.Copy() allocates a single []Term buffer for all elements. object.Copy() allocates three flat slices ([]objectElem, []Term for keys, []Term for values) and wires up the hash chains inline, avoiding the overhead of repeated insert() calls. Benchmark results (5-element containers, darwin/amd64): SetCopy: 11 → 5 allocs (-55%) ObjectCopy: 19 → 7 allocs (-63%) Args.Copy: N+1 → 2 allocs (same as Ref/Call) Signed-off-by: alex60217101990 <alex6021710@gmail.com>
Based on approved proposal: https://github.com/orgs/open-policy-agent/discussions/741
Micro-optimization with measurable performance gains. Minimal code changes, measurable value in production workloads.
Problem
Copy()deep-copies all elements including immutable ground terms, causing excessive allocations during policy evaluation.Solution
Add
CopyNonGround()methods that shallow-copy ground terms (constants), deep-copy only non-ground terms (variables).Benchmark Results
Fully ground terms (typical case):
Before (Copy):
After (CopyNonGround):
Mixed ground/non-ground terms:
Before (Copy):
After (CopyNonGround):