fix: use server-side apply for RBAC resources in management controllers#6157
fix: use server-side apply for RBAC resources in management controllers#6157alexmt wants to merge 1 commit intoakuity:mainfrom
Conversation
✅ Deploy Preview for docs-kargo-io ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Replace the Create→AlreadyExists→Update pattern with server-side apply (SSA) in the project and serviceaccount management controllers. This eliminates ~800 HTTP 409 errors per minute observed in clusters with active Kargo usage, where every reconcile cycle attempted to POST-create RBAC resources that already existed. Fixes akuity#6155
54c213e to
f42dca1
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6157 +/- ##
==========================================
+ Coverage 57.37% 57.54% +0.16%
==========================================
Files 474 474
Lines 49666 40439 -9227
==========================================
- Hits 28497 23270 -5227
+ Misses 19785 15785 -4000
Partials 1384 1384 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I have a question. If my understanding is correct, it seems we are continually attempting to update resources (perhaps on every reconciliation) even when it is not necessary (the object is the same). So even if we switch to SSA (which would be a good improvement and reduce API calls), nothing would stop us from continually performing the apply. I wonder if there is a second improvement we can make that prevents the apply if we know it's not needed. |
|
@jessesuen I didn't look at this in detail yet, but have a recollection of there having been a guard to avoid unnecessary updates over and over, though it's possible I'm confusing it with another similar spot elsewhere in the code. If it's absent, I agree it should be there. |
| return fmt.Errorf("could not determine GVK for object: %w", err) | ||
| } | ||
| obj.GetObjectKind().SetGroupVersionKind(gvks[0]) | ||
| return r.client.Patch(ctx, obj, client.Apply, client.ForceOwnership, client.FieldOwner("kargo")) |
There was a problem hiding this comment.
One implication of switching to Apply is that the controller will NOT overwrite third-party contributions (of annotations, the RBAC subjects of a given CRB, etc). Looking at the current code, the Update call would've replaced everything. I think the SSA behavior is a good thing, just wanted to call out that the remediation behavior is slightly different.
Also, is the field manager name the same before-after here? It should be, to ensure that a newer version of Kargo replaces any earlier's contribution to a given object.
There was a problem hiding this comment.
Good difference in behavior to call out.
The purpose of this code, as is reflected in many of the comments on methods in this file, is to ensure the existence of RBAC resources required for the system to operate correctly upon each Project, so...
Overwriting "third-party contributions" was a feature; not a bug. If these are tampered with, things can break.
And I was. |
|
I can confirm through code inspection that we are ensuring RBAC with every Project reconciliation. So while SSA would help prevent the object's |
Summary
Fixes #6155
Replaces the
Create→AlreadyExists→Updatepattern with server-side apply (SSA) in the project and serviceaccount management controllers.newReconciler, allcreateXFnfunction pointers now use anapplySSAclosure that callsclient.Patch(..., client.Apply, client.ForceOwnership, client.FieldOwner("kargo")). The GVK is populated automatically from the client's scheme so no TypeMeta boilerplate is needed at each call site.Updatecalls are removed.serviceaccounts.gowhich calledr.client.Create/r.client.Updatedirectly.Before: every reconcile cycle generates N × 409s (one per project/resource), then a redundant
Updateeven when nothing changed.After: a single
PATCHper resource — no 409s, and no etcd write when the desired state already matches.