client: remove meta revision by bufferflies · Pull Request #10543 · tikv/pd

bufferflies · 2026-04-01T09:02:18Z

Issue Number: ref #10516, close #10542

author: @disksing

cp c3cd07c6

What problem does this PR solve?

This removes the resource-group meta revision cursor from the client watch path.

What is changed and how does it work?

stop loading the initial resource-group revision only for watch startup
stop passing WithRev(metaRevision) when creating or recreating the watch
stop updating metaRevision from each watched event

Check List

Tests

go test . -run "Test.*ResourceManager.*" -count=1
go test ./resource_group/controller -run TestDoesNotExist -count=1
make check

Side effects

Possible performance regression

Release note

None

Summary by CodeRabbit

Refactor
- Simplified resource-group metadata watcher and retry behavior by removing startup pre-load and local revision tracking. The watcher is now created without anchoring to a specific revision, streamlining initialization and retry flows.

Signed-off-by: bufferflies <1045931706@qq.com>

ti-chi-bot · 2026-04-01T09:02:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign overvenus for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-04-01T09:02:35Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Removed startup loading and per-event revision tracking from the resource-group meta watcher; watcher creation (initial and retry) no longer uses a specific revision.

Changes

Cohort / File(s)	Summary
Meta watch revision removal `client/resource_group/controller/global_controller.go`	Deleted the initial `provider.LoadResourceGroups(ctx)` call and the `metaRevision` variable; removed `opt.WithRev(metaRevision)` from watcher creation (initial and retry) and stopped updating `metaRevision` on received events.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibble through the revision vine,
No more anchors hold my time.
Watches wake without a chain,
Lighter code, a cleaner brain. 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: removing the meta revision tracking from the client watch path.
Description check	✅ Passed	The description covers the problem statement, changes made, test execution, and side effects as per the template requirements.
Linked Issues check	✅ Passed	The PR implements the objective stated in linked issues by removing meta revision cursor from client watch path [`#10542`], addressing the root cause referenced in `#10516`.
Out of Scope Changes check	✅ Passed	All changes are within scope: the modifications target only the global controller's meta revision handling as specified in linked issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bufferflies · 2026-04-01T09:05:00Z

client/resource_group/controller/global_controller.go

 					// Use WithPrevKV() to get the previous key-value pair when get Delete Event.
 					prefix := pd.GroupSettingsPathPrefixBytes(c.keyspaceID)
-					watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithRev(metaRevision), opt.WithPrefix(), opt.WithPrevKV())
+					watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithPrefix(), opt.WithPrevKV())


This changes the reconnect semantics of the resource-group meta watch. Before this PR, the controller resumed from the last processed metaRevision, so updates that happened while the watch stream was broken could still be replayed. After removing WithRev(metaRevision), a re-created watch starts from "now", which can silently skip PUT/DELETE events that landed during the disconnect window. That means the local controller cache can diverge from RM state after a transient watch failure. Please keep a resume revision (or reload a fresh snapshot before recreating the watch) so reconnects do not lose intermediate resource-group changes.

Good catch. I updated the retry path to reload a fresh resource-group snapshot, resync the cached controllers, and recreate the watch from snapshot revision + 1 so reconnects do not skip the disconnect window. I also added TestReloadResourceGroupMetaWatch to cover the retry behavior.

Signed-off-by: bufferflies <1045931706@qq.com>

client/resource_group/controller/global_controller.go

Signed-off-by: bufferflies <1045931706@qq.com>

bufferflies · 2026-04-01T09:19:23Z

Re-reviewed the latest follow-up on top of commit 64409c56374adbb6092ceb6e0e16052da1093706.

The previous reconnect/snapshot finding is now addressed:

snapshot sync still updates existing groups
snapshot sync still tombstones groups that disappeared from the fresh snapshot
snapshot-only groups are now also created in the local controller cache before the watch resumes from revision + 1

I do not have a new finding on this follow-up delta.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/resource_group/controller/global_controller.go`:
- Around line 334-335: Replace the direct initial call to c.provider.Watch(ctx,
...) with a call to reloadResourceGroupMetaWatch(c.loopCtx) so the first
meta-watch is bootstrapped with the controller's loop context (c.loopCtx) and
performs the snapshot/ready barrier before Start() returns; keep retries using
c.loopCtx when re-opening watches, ensure watchMetaChannel is derived from
reloadResourceGroupMetaWatch, and make Stop() cancel/close via c.loopCtx (or an
errgroup tied to it) to prevent goroutine leaks and ensure the initial watch is
established before request traffic can create controllers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 049cc121-4f58-4ef6-ac80-5be97c35319b

📥 Commits

Reviewing files that changed from the base of the PR and between 4ad4e84 and daef237.

📒 Files selected for processing (2)

client/resource_group/controller/global_controller.go
client/resource_group/controller/global_controller_test.go

client/resource_group/controller/global_controller.go

lhy1024 · 2026-04-02T03:36:24Z

client/resource_group/controller/global_controller.go

 			// Use WithPrevKV() to get the previous key-value pair when get Delete Event.
 			prefix := pd.GroupSettingsPathPrefixBytes(c.keyspaceID)
-			watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithRev(metaRevision), opt.WithPrefix(), opt.WithPrevKV())
+			watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithPrefix(), opt.WithPrevKV())


The initial watch here no longer uses a revision, which leaves a startup window. If a resource group changes after the controller starts but before the watch is actually established, the local cache can miss that update. The reconnect path already rebuilds from a snapshot/revision, but the initial startup path still needs the same barrier.

Yes, but we just need the latest event and can ignore all the mvvc events

I agree we do not need the whole MVCC history. The issue is that without a startup barrier, we may miss the latest update itself if it happens before the watch is established and no later event arrives. In that case the local cache can stay stale indefinitely.

client/resource_group/controller/global_controller.go

bufferflies · 2026-04-08T07:24:39Z

@coderabbitai full review

coderabbitai · 2026-04-08T07:24:46Z

✅ Actions performed

Full review triggered.

This reverts commit 64409c5. Signed-off-by: bufferflies <1045931706@qq.com>

This reverts commit daef237. Signed-off-by: bufferflies <1045931706@qq.com>

coderabbitai

♻️ Duplicate comments (1)

client/resource_group/controller/global_controller.go (1)
334-335: ⚠️ Potential issue | 🟠 Major

Use c.loopCtx (not parent ctx) for meta watch lifecycle.

Both initial and retry meta-watch calls are still bound to ctx, while Stop() only cancels c.loopCtx. That can leave watch streams running after controller stop if the parent context is still alive.
Suggested fix
-			watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithPrefix(), opt.WithPrevKV())
+			watchMetaChannel, err = c.provider.Watch(c.loopCtx, prefix, opt.WithPrefix(), opt.WithPrevKV())
...
-					watchMetaChannel, err = c.provider.Watch(ctx, prefix, opt.WithPrefix(), opt.WithPrevKV())
+					watchMetaChannel, err = c.provider.Watch(c.loopCtx, prefix, opt.WithPrefix(), opt.WithPrevKV())
As per coding guidelines, "Prevent goroutine leaks: pair with cancellation; consider errgroup".

Also applies to: 362-363
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/resource_group/controller/global_controller.go` around lines 334 -
335, The meta-watch calls are using the parent ctx instead of the controller
lifecycle context, so change the Watch invocations to use c.loopCtx (not ctx) so
the watch stream is cancelled when Stop() cancels c.loopCtx; specifically update
the c.provider.Watch(...) calls (both the initial call near the
prefix/opt.WithPrevKV() and the retry/loop call around the same logic) to pass
c.loopCtx and ensure any goroutine handling the watch is tied to c.loopCtx for
proper cancellation and no-leak pairing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@client/resource_group/controller/global_controller.go`:
- Around line 334-335: The meta-watch calls are using the parent ctx instead of
the controller lifecycle context, so change the Watch invocations to use
c.loopCtx (not ctx) so the watch stream is cancelled when Stop() cancels
c.loopCtx; specifically update the c.provider.Watch(...) calls (both the initial
call near the prefix/opt.WithPrevKV() and the retry/loop call around the same
logic) to pass c.loopCtx and ensure any goroutine handling the watch is tied to
c.loopCtx for proper cancellation and no-leak pairing.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc8ad108-2f5e-44e3-b3d9-0e4a354ae8d7

📥 Commits

Reviewing files that changed from the base of the PR and between dca466b and 1228f9d.

📒 Files selected for processing (1)

client/resource_group/controller/global_controller.go

ti-chi-bot · 2026-04-08T07:36:31Z

@bufferflies: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-unit-test-next-gen-2	`1228f9d`	link	true	`/test pull-unit-test-next-gen-2`
pull-unit-test-next-gen-3	`1228f9d`	link	true	`/test pull-unit-test-next-gen-3`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

codecov · 2026-04-08T07:37:48Z

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 78.89%. Comparing base (3eb99ae) to head (1228f9d).
⚠️ Report is 16 commits behind head on master.

❌ Your patch check has failed because the patch coverage (50.00%) is below the target coverage (74.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10543      +/-   ##
==========================================
+ Coverage   78.88%   78.89%   +0.01%     
==========================================
  Files         530      532       +2     
  Lines       71548    71858     +310     
==========================================
+ Hits        56439    56694     +255     
- Misses      11092    11133      +41     
- Partials     4017     4031      +14

Flag	Coverage Δ
unittests	`78.89% <50.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bufferflies · 2026-04-10T10:18:30Z

/ping @disksing @lhy1024

client: remove meta revision

4ad4e84

Signed-off-by: bufferflies <1045931706@qq.com>

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/needs-triage-completed labels Apr 1, 2026

ti-chi-bot bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 1, 2026

bufferflies commented Apr 1, 2026

View reviewed changes

client: reload RM snapshot before watch retry

daef237

Signed-off-by: bufferflies <1045931706@qq.com>

ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 1, 2026

bufferflies commented Apr 1, 2026

View reviewed changes

client/resource_group/controller/global_controller.go Outdated Show resolved Hide resolved

client: sync new RM groups from snapshot reload

64409c5

Signed-off-by: bufferflies <1045931706@qq.com>

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

client/resource_group/controller/global_controller.go Show resolved Hide resolved

lhy1024 reviewed Apr 2, 2026

View reviewed changes

client/resource_group/controller/global_controller.go Outdated Show resolved Hide resolved

bufferflies requested a review from disksing April 3, 2026 02:06

ti-chi-bot bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed do-not-merge/needs-triage-completed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 8, 2026

bufferflies requested a review from lhy1024 April 8, 2026 07:24

bufferflies commented Apr 8, 2026

View reviewed changes

client/resource_group/controller/global_controller.go Show resolved Hide resolved

bufferflies added 2 commits April 8, 2026 15:25

Revert "client: sync new RM groups from snapshot reload"

daf78dc

This reverts commit 64409c5. Signed-off-by: bufferflies <1045931706@qq.com>

Revert "client: reload RM snapshot before watch retry"

1228f9d

This reverts commit daef237. Signed-off-by: bufferflies <1045931706@qq.com>

bufferflies force-pushed the pr-merge/c3cd07c6-remove-meta-revision branch from 2cfa22f to 1228f9d Compare April 8, 2026 07:26

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

bufferflies requested a review from okJiang April 10, 2026 10:18

Conversation

bufferflies commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot bot commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

bufferflies Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

bufferflies Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bufferflies commented Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lhy1024 Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bufferflies Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

lhy1024 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bufferflies commented Apr 8, 2026

Uh oh!

coderabbitai bot commented Apr 8, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Apr 8, 2026

Uh oh!

codecov bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bufferflies commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bufferflies commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

lhy1024 Apr 2, 2026 •

edited

Loading

codecov bot commented Apr 8, 2026 •

edited

Loading