Skip to content

resourcegroup: replace blind sleep with signal-aware wait in acquireTokens (#10252)#10507

Open
JmPotato wants to merge 1 commit intotikv:release-8.5from
JmPotato:cherry-pick-10252-to-release-8.5
Open

resourcegroup: replace blind sleep with signal-aware wait in acquireTokens (#10252)#10507
JmPotato wants to merge 1 commit intotikv:release-8.5from
JmPotato:cherry-pick-10252-to-release-8.5

Conversation

@JmPotato
Copy link
Copy Markdown
Member

@JmPotato JmPotato commented Mar 30, 2026

What problem does this PR solve?

Issue Number: close #10251.

What is changed and how does it work?

Cherry-pick of #10252 to release-8.5.

Add a close-and-recreate channel on the `Limiter` that is closed on every `Reconfigure()` call.
The retry loop in `acquireTokens` now selects on this channel instead of blind-sleeping,
waking up immediately when new tokens arrive. The timer serves as a fallback to preserve original
behavior when no `Reconfigure` occurs.

Check List

Tests

  • Unit test

Release note

None.

Summary by CodeRabbit

  • Performance Improvements
    • Token acquisition retry mechanism now responds immediately to configuration changes instead of waiting for full intervals, reducing latency during system reconfigurations.
    • Enhanced recovery from resource throttling with faster adaptation to configuration updates.

…okens

(cherry picked from commit 961bd132b28f6345af0df288a33741f9c7dcfd2e)
Signed-off-by: JmPotato <github@ipotato.me>
@ti-chi-bot ti-chi-bot bot added do-not-merge/cherry-pick-not-approved release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Mar 30, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Mar 30, 2026

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be LGTMed and approved by the reviewers firstly.
  2. For pull requests to TiDB-x branches, it must have no failed tests.
  3. AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d22e997c-f97b-4367-9df4-b0acc046cfd0

📥 Commits

Reviewing files that changed from the base of the PR and between 3aeaf6e and e20abbf.

📒 Files selected for processing (4)
  • client/resource_group/controller/controller.go
  • client/resource_group/controller/controller_test.go
  • client/resource_group/controller/limiter.go
  • client/resource_group/controller/limiter_test.go

📝 Walkthrough

Walkthrough

These changes implement a timer-aware retry mechanism with reconfiguration event signaling for the resource group token limiter. The limiter now exposes a reconfiguration notification channel, and the controller's token acquisition retry loop replaces blind sleep with a select statement that listens for timer expiry, reconfiguration signals, or context cancellation.

Changes

Cohort / File(s) Summary
Limiter Reconfiguration Signaling
client/resource_group/controller/limiter.go, client/resource_group/controller/limiter_test.go
Added reconfiguredCh field and GetReconfiguredCh() method to signal reconfiguration events; Reconfigure() now closes the current channel and allocates a new one to wake waiting goroutines. Includes tests validating channel closure and multi-waiter wake-up behavior.
Controller Retry Loop Integration
client/resource_group/controller/controller.go
Modified acquireTokens retry mechanism to replace unconditional time.Sleep() with a timer-driven select loop that responds to context cancellation, reconfiguration signals, and timer expiry, enabling early wake-up when tokens are refreshed.
Controller Retry Tests
client/resource_group/controller/controller_test.go
Added two test cases: TestAcquireTokensSignalAwareWait validates early wake-up on reconfiguration; TestAcquireTokensFallbackToTimer validates fallback to full timeout when no signal arrives.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • rleungx
  • disksing

Poem

🐰 A rabbit hops through retry loops with grace,
No more blind sleeps in this token race!
When reconfig signals flow through the wire,
We wake right up—no need to tire!
One second spikes? Now gone with a cheer,
Smart timers and channels bring speed so clear! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: replacing blind sleep with signal-aware wait in the acquireTokens method, directly addressing the performance issue in issue #10251.
Description check ✅ Passed The PR description includes the issue number (close #10251), explains the change and how it works with a clear commit message, and provides the required test and release note sections.
Linked Issues check ✅ Passed The code changes fully address issue #10251: a reconfiguration-aware channel replaces blind sleep in the retry loop, allowing immediate wakeup when tokens are refreshed via Reconfigure(), eliminating the ~1s latency spike problem.
Out of Scope Changes check ✅ Passed All changes are directly scoped to addressing issue #10251: the Limiter channel mechanism, controller retry loop refactoring, and corresponding unit tests are all necessary and cohesive to implement the solution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 30, 2026
@JmPotato
Copy link
Copy Markdown
Member Author

/retest

1 similar comment
@JmPotato
Copy link
Copy Markdown
Member Author

/retest

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 59.37500% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.95%. Comparing base (d76bec1) to head (e20abbf).
⚠️ Report is 5 commits behind head on release-8.5.

❌ Your patch check has failed because the patch coverage (59.37%) is below the target coverage (74.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@               Coverage Diff               @@
##           release-8.5   #10507      +/-   ##
===============================================
+ Coverage        77.93%   77.95%   +0.02%     
===============================================
  Files              484      485       +1     
  Lines            65594    65743     +149     
===============================================
+ Hits             51119    51251     +132     
- Misses           10703    10720      +17     
  Partials          3772     3772              
Flag Coverage Δ
unittests 77.95% <59.37%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 30, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Mar 30, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: okJiang
Once this PR has been reviewed and has the lgtm label, please assign bufferflies for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Mar 30, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-03-30 10:19:17.982923873 +0000 UTC m=+173963.188283920: ☑️ agreed by okJiang.

case rmpb.GroupMode_RUMode:
res := make([]*Reservation, 0, len(requestUnitLimitTypeList))
for typ, counter := range gc.run.requestUnitTokens {
reconfiguredCh = counter.limiter.GetReconfiguredCh()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move it to line 1365 if no only one type ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although there is technically only one type at the moment, I kept it within a for loop to avoid code conflicts or confusing semantics.

default:
}
}
case <-waitTimer.C:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to stop the timer here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a time.Timer that has already triggered, there is no need to stop it.

@bufferflies
Copy link
Copy Markdown
Contributor

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

✅ Actions performed

Full review triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/cherry-pick-not-approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants