resourcemanager: make controller config updates atomic#10504
resourcemanager: make controller config updates atomic#10504okJiang wants to merge 1 commit intotikv:masterfrom
Conversation
📝 WalkthroughWalkthroughThis pull request refactors controller configuration updates to be atomic at the API level. Previously, individual config items were updated one-at-a-time through separate storage operations. The changes introduce a batch update method that validates all items before persisting, ensuring that mixed valid/invalid payloads either fully succeed or fully fail without partial persistence. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: okjiang <819421878@qq.com>
ae1bc34 to
f7de7cc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integrations/mcs/resourcemanager/api_test.go`:
- Around line 318-323: The test sends "true" as a string for the
"enable-controller-trace-log" field which makes the request's failure
non-deterministic; update the call to tryToSetControllerConfig so the map value
for "enable-controller-trace-log" is a boolean true (not the string "true")
while keeping "ltb-max-wait-duration" as the invalid string "not-a-duration" so
the request deterministically exercises the valid+invalid mix; locate the call
to tryToSetControllerConfig in the test and change that map entry accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f0d9242a-516b-4128-81df-8fb780263953
📒 Files selected for processing (4)
pkg/mcs/resourcemanager/server/apis/v1/api.gopkg/mcs/resourcemanager/server/manager.gopkg/mcs/resourcemanager/server/manager_test.gotests/integrations/mcs/resourcemanager/api_test.go
| resp, statusCode := tryToSetControllerConfig(re, suite.cluster.GetLeaderServer().GetAddr(), map[string]any{ | ||
| "enable-controller-trace-log": "true", | ||
| "ltb-max-wait-duration": "not-a-duration", | ||
| }) | ||
| re.Equal(http.StatusBadRequest, statusCode) | ||
| re.Contains(resp, "time:") |
There was a problem hiding this comment.
Use a real boolean here to keep the regression deterministic.
Line 319 sends "true" as a string, so this request can fail on either field instead of exercising the intended valid+invalid mix. That makes the "time:" assertion order-dependent and weakens the atomicity regression.
Suggested fix
resp, statusCode := tryToSetControllerConfig(re, suite.cluster.GetLeaderServer().GetAddr(), map[string]any{
- "enable-controller-trace-log": "true",
+ "enable-controller-trace-log": true,
"ltb-max-wait-duration": "not-a-duration",
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resp, statusCode := tryToSetControllerConfig(re, suite.cluster.GetLeaderServer().GetAddr(), map[string]any{ | |
| "enable-controller-trace-log": "true", | |
| "ltb-max-wait-duration": "not-a-duration", | |
| }) | |
| re.Equal(http.StatusBadRequest, statusCode) | |
| re.Contains(resp, "time:") | |
| resp, statusCode := tryToSetControllerConfig(re, suite.cluster.GetLeaderServer().GetAddr(), map[string]any{ | |
| "enable-controller-trace-log": true, | |
| "ltb-max-wait-duration": "not-a-duration", | |
| }) | |
| re.Equal(http.StatusBadRequest, statusCode) | |
| re.Contains(resp, "time:") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integrations/mcs/resourcemanager/api_test.go` around lines 318 - 323,
The test sends "true" as a string for the "enable-controller-trace-log" field
which makes the request's failure non-deterministic; update the call to
tryToSetControllerConfig so the map value for "enable-controller-trace-log" is a
boolean true (not the string "true") while keeping "ltb-max-wait-duration" as
the invalid string "not-a-duration" so the request deterministically exercises
the valid+invalid mix; locate the call to tryToSetControllerConfig in the test
and change that map entry accordingly.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/integrations/mcs/resourcemanager/api_test.go (1)
343-348:⚠️ Potential issue | 🟠 MajorUse a real boolean in the mixed-payload regression.
Line 344 sends
"true"as a string, so this request contains two invalid values instead of one valid field plus one invalid field. That weakens the all-or-nothing regression and can make the"time:"assertion fail for the wrong reason.Suggested fix
resp, statusCode := tryToSetControllerConfig(re, suite.cluster.GetLeaderServer().GetAddr(), map[string]any{ - "enable-controller-trace-log": "true", + "enable-controller-trace-log": true, "ltb-max-wait-duration": "not-a-duration", })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integrations/mcs/resourcemanager/api_test.go` around lines 343 - 348, The test is sending the boolean as a string which creates two invalid fields; update the payload in the call to tryToSetControllerConfig so "enable-controller-trace-log" is sent as a real boolean true (not the string "true") while leaving "ltb-max-wait-duration": "not-a-duration" unchanged, so the request has one valid field and one invalid duration field and the existing assertion against tryToSetControllerConfig's response containing "time:" remains meaningful.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/mcs/resourcemanager/metadataapi/config_service.go`:
- Around line 251-257: The current handler collapses all non-permission errors
from s.configStore.UpdateControllerConfigItems(resolvedConf) into 400 Bad
Request; change the error handling to distinguish validation errors from
persistence/storage failures (e.g., errors returned by SaveControllerConfig in
the store). Specifically, detect validation-related errors (the same error
type/value returned by your validation code) and continue to return 400 for
those, but treat store persistence/etcd/write errors (wrap/inspect errors coming
from UpdateControllerConfigItems/SaveControllerConfig or provide a
store.IsPersistenceError helper) as server-side failures and return an
appropriate 5xx (e.g., 500 or 503) with a clear log message; keep the existing
IsMetadataWriteDisabledError check for forbidden. Ensure the store layer
wraps/save failures so the handler can reliably distinguish the error kinds.
---
Duplicate comments:
In `@tests/integrations/mcs/resourcemanager/api_test.go`:
- Around line 343-348: The test is sending the boolean as a string which creates
two invalid fields; update the payload in the call to tryToSetControllerConfig
so "enable-controller-trace-log" is sent as a real boolean true (not the string
"true") while leaving "ltb-max-wait-duration": "not-a-duration" unchanged, so
the request has one valid field and one invalid duration field and the existing
assertion against tryToSetControllerConfig's response containing "time:" remains
meaningful.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 01871f43-b5bb-429e-80b5-dbc8ca11a1a6
📒 Files selected for processing (5)
pkg/mcs/resourcemanager/metadataapi/config_service.gopkg/mcs/resourcemanager/metadataapi/config_service_test.gopkg/mcs/resourcemanager/server/manager.gopkg/mcs/resourcemanager/server/manager_test.gotests/integrations/mcs/resourcemanager/api_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/mcs/resourcemanager/server/manager_test.go
| if err := s.configStore.UpdateControllerConfigItems(resolvedConf); err != nil { | ||
| if rmserver.IsMetadataWriteDisabledError(err) { | ||
| c.String(http.StatusForbidden, err.Error()) | ||
| return | ||
| } | ||
| c.String(http.StatusBadRequest, err.Error()) | ||
| return |
There was a problem hiding this comment.
Don't collapse persistence failures into 400 Bad Request.
Line 251 now calls the batch path, and that path can fail after validation when SaveControllerConfig hits storage. Returning 400 for every non-permission error will mislabel etcd/write failures as client input problems; please distinguish validation errors from persistence errors here, even if that means wrapping save failures from the store layer. As per coding guidelines, "HTTP handlers must validate payloads and return proper status codes; avoid panics".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/mcs/resourcemanager/metadataapi/config_service.go` around lines 251 -
257, The current handler collapses all non-permission errors from
s.configStore.UpdateControllerConfigItems(resolvedConf) into 400 Bad Request;
change the error handling to distinguish validation errors from
persistence/storage failures (e.g., errors returned by SaveControllerConfig in
the store). Specifically, detect validation-related errors (the same error
type/value returned by your validation code) and continue to return 400 for
those, but treat store persistence/etcd/write errors (wrap/inspect errors coming
from UpdateControllerConfigItems/SaveControllerConfig or provide a
store.IsPersistenceError helper) as server-side failures and return an
appropriate 5xx (e.g., 500 or 503) with a clear log message; keep the existing
IsMetadataWriteDisabledError check for forbidden. Ensure the store layer
wraps/save failures so the handler can reliably distinguish the error kinds.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lhy1024 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/retest |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10504 +/- ##
==========================================
+ Coverage 78.88% 78.95% +0.07%
==========================================
Files 530 531 +1
Lines 71548 71660 +112
==========================================
+ Hits 56439 56578 +139
+ Misses 11092 11058 -34
- Partials 4017 4024 +7
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
@okJiang: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: Close #10335
POST /resource-manager/api/v1/config/controllervalidates request keys beforeapplying updates, but it still persists each field one at a time through
UpdateControllerConfigItem. A mixed valid/invalid payload can therefore writean earlier field before a later invalid value returns
400.What is changed and how does it work?
UpdateControllerConfigItemsto clone the current controller config,apply every requested field to the clone, and persist once on success
the same entry point
Check List
Tests
Release note
Summary by CodeRabbit
Bug Fixes
Tests