Skip to content

fix: use server-side apply for RBAC resources in management controllers#6157

Open
alexmt wants to merge 1 commit intoakuity:mainfrom
alexmt:fix/rbac-ssa-apply
Open

fix: use server-side apply for RBAC resources in management controllers#6157
alexmt wants to merge 1 commit intoakuity:mainfrom
alexmt:fix/rbac-ssa-apply

Conversation

@alexmt
Copy link
Copy Markdown
Contributor

@alexmt alexmt commented Apr 23, 2026

Summary

Fixes #6155

Replaces the CreateAlreadyExistsUpdate pattern with server-side apply (SSA) in the project and serviceaccount management controllers.

  • In newReconciler, all createXFn function pointers now use an applySSA closure that calls client.Patch(..., client.Apply, client.ForceOwnership, client.FieldOwner("kargo")). The GVK is populated automatically from the client's scheme so no TypeMeta boilerplate is needed at each call site.
  • All call sites are simplified: AlreadyExists checks and fallback Update calls are removed.
  • Same fix applied to serviceaccounts.go which called r.client.Create/r.client.Update directly.

Before: every reconcile cycle generates N × 409s (one per project/resource), then a redundant Update even when nothing changed.

After: a single PATCH per resource — no 409s, and no etcd write when the desired state already matches.

@alexmt alexmt requested a review from a team as a code owner April 23, 2026 17:44
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 23, 2026

Deploy Preview for docs-kargo-io ready!

Name Link
🔨 Latest commit f42dca1
🔍 Latest deploy log https://app.netlify.com/projects/docs-kargo-io/deploys/69ea5b040d7859000823a192
😎 Deploy Preview https://deploy-preview-6157.docs.kargo.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Replace the Create→AlreadyExists→Update pattern with server-side apply
(SSA) in the project and serviceaccount management controllers. This
eliminates ~800 HTTP 409 errors per minute observed in clusters with
active Kargo usage, where every reconcile cycle attempted to POST-create
RBAC resources that already existed.

Fixes akuity#6155
@alexmt alexmt force-pushed the fix/rbac-ssa-apply branch from 54c213e to f42dca1 Compare April 23, 2026 17:46
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 92.15686% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.54%. Comparing base (a25f290) to head (f42dca1).

Files with missing lines Patch % Lines
pkg/controller/management/projects/projects.go 94.73% 1 Missing and 1 partial ⚠️
...ller/management/serviceaccounts/serviceaccounts.go 84.61% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6157      +/-   ##
==========================================
+ Coverage   57.37%   57.54%   +0.16%     
==========================================
  Files         474      474              
  Lines       49666    40439    -9227     
==========================================
- Hits        28497    23270    -5227     
+ Misses      19785    15785    -4000     
  Partials     1384     1384              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jessesuen
Copy link
Copy Markdown
Member

jessesuen commented Apr 23, 2026

I have a question. If my understanding is correct, it seems we are continually attempting to update resources (perhaps on every reconciliation) even when it is not necessary (the object is the same). So even if we switch to SSA (which would be a good improvement and reduce API calls), nothing would stop us from continually performing the apply.

I wonder if there is a second improvement we can make that prevents the apply if we know it's not needed.

@krancour krancour self-requested a review April 23, 2026 20:22
@krancour krancour added this to the v1.10.3 milestone Apr 23, 2026
@krancour krancour added kind/bug Something isn't working as intended; If unsure that something IS a bug, start a discussion instead kind/refactor Non-functional changes to implementation details priority/normal This is the priority for most work area/management-controller Affects the controller that manages the Kargo control plane itself labels Apr 23, 2026
@krancour
Copy link
Copy Markdown
Member

@jessesuen I didn't look at this in detail yet, but have a recollection of there having been a guard to avoid unnecessary updates over and over, though it's possible I'm confusing it with another similar spot elsewhere in the code. If it's absent, I agree it should be there.

return fmt.Errorf("could not determine GVK for object: %w", err)
}
obj.GetObjectKind().SetGroupVersionKind(gvks[0])
return r.client.Patch(ctx, obj, client.Apply, client.ForceOwnership, client.FieldOwner("kargo"))
Copy link
Copy Markdown
Contributor

@EronWright EronWright Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One implication of switching to Apply is that the controller will NOT overwrite third-party contributions (of annotations, the RBAC subjects of a given CRB, etc). Looking at the current code, the Update call would've replaced everything. I think the SSA behavior is a good thing, just wanted to call out that the remediation behavior is slightly different.

Also, is the field manager name the same before-after here? It should be, to ensure that a newer version of Kargo replaces any earlier's contribution to a given object.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good difference in behavior to call out.

The purpose of this code, as is reflected in many of the comments on methods in this file, is to ensure the existence of RBAC resources required for the system to operate correctly upon each Project, so...

Overwriting "third-party contributions" was a feature; not a bug. If these are tampered with, things can break.

@krancour
Copy link
Copy Markdown
Member

it's possible I'm confusing it with another similar spot elsewhere in the code

And I was.

@jessesuen
Copy link
Copy Markdown
Member

jessesuen commented Apr 24, 2026

I can confirm through code inspection that we are ensuring RBAC with every Project reconciliation. So while SSA would help prevent the object's resourceVersion from constantly being updated, it would still result in an API call. To eliminate unnecessary API calls entirely, we could check what we have in an informer cache is already what we want and skip the call(s) completely. Even something like a DeepEquals check would help prevent this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/management-controller Affects the controller that manages the Kargo control plane itself kind/bug Something isn't working as intended; If unsure that something IS a bug, start a discussion instead kind/refactor Non-functional changes to implementation details priority/normal This is the priority for most work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Management controllers generate excessive 409s by using Create→AlreadyExists→Update instead of server-side apply

4 participants