Conversation
Signed-off-by: bufferflies <1045931706@qq.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoved startup loading and per-event revision tracking from the resource-group meta watcher; watcher creation (initial and retry) no longer uses a specific revision. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| // Use WithPrevKV() to get the previous key-value pair when get Delete Event. | ||
| prefix := pd.GroupSettingsPathPrefixBytes(c.keyspaceID) | ||
| watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithRev(metaRevision), opt.WithPrefix(), opt.WithPrevKV()) | ||
| watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithPrefix(), opt.WithPrevKV()) |
There was a problem hiding this comment.
This changes the reconnect semantics of the resource-group meta watch. Before this PR, the controller resumed from the last processed metaRevision, so updates that happened while the watch stream was broken could still be replayed. After removing WithRev(metaRevision), a re-created watch starts from "now", which can silently skip PUT/DELETE events that landed during the disconnect window. That means the local controller cache can diverge from RM state after a transient watch failure. Please keep a resume revision (or reload a fresh snapshot before recreating the watch) so reconnects do not lose intermediate resource-group changes.
There was a problem hiding this comment.
Good catch. I updated the retry path to reload a fresh resource-group snapshot, resync the cached controllers, and recreate the watch from snapshot revision + 1 so reconnects do not skip the disconnect window. I also added TestReloadResourceGroupMetaWatch to cover the retry behavior.
Signed-off-by: bufferflies <1045931706@qq.com>
Signed-off-by: bufferflies <1045931706@qq.com>
|
Re-reviewed the latest follow-up on top of commit The previous reconnect/snapshot finding is now addressed:
I do not have a new finding on this follow-up delta. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/resource_group/controller/global_controller.go`:
- Around line 334-335: Replace the direct initial call to c.provider.Watch(ctx,
...) with a call to reloadResourceGroupMetaWatch(c.loopCtx) so the first
meta-watch is bootstrapped with the controller's loop context (c.loopCtx) and
performs the snapshot/ready barrier before Start() returns; keep retries using
c.loopCtx when re-opening watches, ensure watchMetaChannel is derived from
reloadResourceGroupMetaWatch, and make Stop() cancel/close via c.loopCtx (or an
errgroup tied to it) to prevent goroutine leaks and ensure the initial watch is
established before request traffic can create controllers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 049cc121-4f58-4ef6-ac80-5be97c35319b
📒 Files selected for processing (2)
client/resource_group/controller/global_controller.goclient/resource_group/controller/global_controller_test.go
| // Use WithPrevKV() to get the previous key-value pair when get Delete Event. | ||
| prefix := pd.GroupSettingsPathPrefixBytes(c.keyspaceID) | ||
| watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithRev(metaRevision), opt.WithPrefix(), opt.WithPrevKV()) | ||
| watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithPrefix(), opt.WithPrevKV()) |
There was a problem hiding this comment.
The initial watch here no longer uses a revision, which leaves a startup window. If a resource group changes after the controller starts but before the watch is actually established, the local cache can miss that update. The reconnect path already rebuilds from a snapshot/revision, but the initial startup path still needs the same barrier.
There was a problem hiding this comment.
Yes, but we just need the latest event and can ignore all the mvvc events
There was a problem hiding this comment.
I agree we do not need the whole MVCC history. The issue is that without a startup barrier, we may miss the latest update itself if it happens before the watch is established and no later event arrives. In that case the local cache can stay stale indefinitely.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
2cfa22f to
1228f9d
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
client/resource_group/controller/global_controller.go (1)
334-335:⚠️ Potential issue | 🟠 MajorUse
c.loopCtx(not parentctx) for meta watch lifecycle.Both initial and retry meta-watch calls are still bound to
ctx, whileStop()only cancelsc.loopCtx. That can leave watch streams running after controller stop if the parent context is still alive.Suggested fix
- watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithPrefix(), opt.WithPrevKV()) + watchMetaChannel, err = c.provider.Watch(c.loopCtx, prefix, opt.WithPrefix(), opt.WithPrevKV()) ... - watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithPrefix(), opt.WithPrevKV()) + watchMetaChannel, err = c.provider.Watch(c.loopCtx, prefix, opt.WithPrefix(), opt.WithPrevKV())As per coding guidelines, "Prevent goroutine leaks: pair with cancellation; consider errgroup".
Also applies to: 362-363
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/resource_group/controller/global_controller.go` around lines 334 - 335, The meta-watch calls are using the parent ctx instead of the controller lifecycle context, so change the Watch invocations to use c.loopCtx (not ctx) so the watch stream is cancelled when Stop() cancels c.loopCtx; specifically update the c.provider.Watch(...) calls (both the initial call near the prefix/opt.WithPrevKV() and the retry/loop call around the same logic) to pass c.loopCtx and ensure any goroutine handling the watch is tied to c.loopCtx for proper cancellation and no-leak pairing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@client/resource_group/controller/global_controller.go`:
- Around line 334-335: The meta-watch calls are using the parent ctx instead of
the controller lifecycle context, so change the Watch invocations to use
c.loopCtx (not ctx) so the watch stream is cancelled when Stop() cancels
c.loopCtx; specifically update the c.provider.Watch(...) calls (both the initial
call near the prefix/opt.WithPrevKV() and the retry/loop call around the same
logic) to pass c.loopCtx and ensure any goroutine handling the watch is tied to
c.loopCtx for proper cancellation and no-leak pairing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc8ad108-2f5e-44e3-b3d9-0e4a354ae8d7
📒 Files selected for processing (1)
client/resource_group/controller/global_controller.go
|
@bufferflies: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (50.00%) is below the target coverage (74.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #10543 +/- ##
==========================================
+ Coverage 78.88% 78.89% +0.01%
==========================================
Files 530 532 +2
Lines 71548 71858 +310
==========================================
+ Hits 56439 56694 +255
- Misses 11092 11133 +41
- Partials 4017 4031 +14
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Issue Number: ref #10516, close #10542
author: @disksing
cp
c3cd07c6What problem does this PR solve?
This removes the resource-group meta revision cursor from the client watch path.
What is changed and how does it work?
WithRev(metaRevision)when creating or recreating the watchmetaRevisionfrom each watched eventCheck List
Tests
go test . -run "Test.*ResourceManager.*" -count=1go test ./resource_group/controller -run TestDoesNotExist -count=1make checkSide effects
Release note
Summary by CodeRabbit