feat: [DSM-103] Improve scheduler fairness by alin-at-dfinity · Pull Request #9985 · dfinity/ic

alin-at-dfinity · 2026-04-22T15:44:25Z

This is a collection of scheduler improvements backported from the active canister scheduler dev branch, with the goal of having the scheduler efficiency proptest pass (with reduced success thresholds). And, by applying the proptest and all other RoundSchedule tests onto master, hugely reducing the size of that change.

Charge heap delta rate limited canisters, so that they actually skip execution rounds instead of being merely delayed while accumulating priority.
Charge immediately for the first round of every long execution (which was scheduled as a new execution).
Apply an exponential decay to AP outside the [-2000, 500] range (-2000 because of 20 max DTS rounds, the 500 is somewhat arbitrary, but seems to work well). Due to the interaction between long and short executions, runaway priorities are inevitable. This provides a soft bound for runaway AP, while still preserving relative priorities to some extent. It also makes priority resets unnecessary (to be removed later).
Fully distribute all positive free compute, even if it means exceeding 100 priority per canister (by distributing it equally to all canisters).
Allow long executions to use all scheduler cores when there are no new executions.
Fully segregate long and new executions across cores, to prevent inversion of priority when lower priority long executions get a slice executed.
Track long executions across iterations, not just the ones from the start of the round.

Instead of a binary prioritized / opportunistic flag, explicitly (record and) prioritize long executions based on number of slices executed, AP and round when the long execution started. This ensures that we don't starve low priority canisters (which may happen with bounded AP and just the right distribution across execution cores). Also switch from persisting SubnetSchedule spread across individual canister states to persisting it as part of the subnet's SystemMetadata.

These are some scheduler improvements backported from the active canister scheduler dev branch, with the goal of having the scheduler efficiency proptest pass (with reduced success thresholds). * Fully segregate long and new executions across cores, to prevent inversion of priority when lower priority long executions get a slice executed. * Track long executions across iterations, not just the ones from the start of the round. * Charge heap delta rate limited canisters, so that they are actually skipped instead of just delayed while accumulating priority. * Charge immediately for the first round of every long execution (scheduled as a new execution). * Apply an exponential decay to AP outside the `[-2000, 500]` range (`-2000` because of 20 max DTS rounds, the `500` is somewhat arbitrary, but seems to work well). Due to the interaction between long and short executions, runaway priorities are inevitable. This provides a soft bound for runaway AP, while still preserving relative priorities to some extent. It also makes priority resets unnecessary (to be removed separately). * Fully distribute all positive free compute, even if it means exceeding 100 priority per canister (by distributing it equally to all canisters). * Allow long executions to use all scheduler cores if there are no new executions.

… bail out before the latter if there are no active canisters.

…ction. And only do so once per round. * Tweak the per_canister_cap calculation so we always end up with sum(AP) >= 0. * Simplify long execution core calculation. * Improve some comments.

In field names and comments, replace "priority credit" and "executed slices" with "executed rounds". "Priority credit" was the old mechanism, now replaced. And "executed slices" is a misnomer, what we're actually counting is rounds during which a long execution made progress, not the actual number of slices executed (multiple slices might be executed in any given round, but we charge for rounds, not slices).

…scheduler-proptest

…ture::canister_priority_mut().

…harging for in-progress executions; and CanisterRoundState ordering.

github-actions Bot added the feat label Apr 22, 2026

alin-at-dfinity added 2 commits April 24, 2026 14:39

alin-at-dfinity force-pushed the alin/DSM-103-scheduler-proptest branch from b86e911 to ac9fba3 Compare April 24, 2026 15:55

alin-at-dfinity changed the base branch from master to alin/DSM-103-long-execution-priority-queue April 24, 2026 15:55

alin-at-dfinity changed the title ~~feat: [DSM-103] Scheduler efficiency proptest~~ feat: [DSM-103] Improve scheduler fairness Apr 24, 2026

alin-at-dfinity marked this pull request as ready for review April 24, 2026 16:01

alin-at-dfinity requested a review from a team as a code owner April 24, 2026 16:01

Base automatically changed from alin/DSM-103-long-execution-priority-queue to master April 27, 2026 09:53

Merge branch 'master' into alin/DSM-103-scheduler-proptest

4943fec

github-actions Bot added the @team-dsm label Apr 27, 2026

alin-at-dfinity added 13 commits April 27, 2026 11:43

Fix (same) comment. Simplify test fixture.

63b0430

Swap around scheduling and subnet available memory calculation, so we…

13225f5

… bail out before the latter if there are no active canisters.

* Also skip and charge canisters rate limited for install code instru…

285c9b2

…ction. And only do so once per round. * Tweak the per_canister_cap calculation so we always end up with sum(AP) >= 0. * Simplify long execution core calculation. * Improve some comments.

Add more tests for rate limiting and long_execution_start_round.

7ee8be5

Merge branch 'master' into alin/DSM-103-scheduler-proptest

419fd44

Merge branch 'alin/DSM-103-rename-priority-credit' into alin/DSM-103-…

374d6ca

…scheduler-proptest

Drop RoundScheduleFixture::set_priority(), switch to RoundScheduleFix…

e7b57b8

…ture::canister_priority_mut().

Add AP exponential decay test.

f78bcab

Merge branch 'master' into alin/DSM-103-scheduler-proptest

b98db6f

Add finish_round_free_compute_capped test.

c951bab

Add tests for charging for the first round of a long execution; not c…

40d13eb

…harging for in-progress executions; and CanisterRoundState ordering.

Make clippy happy.

f21dde5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: [DSM-103] Improve scheduler fairness#9985

feat: [DSM-103] Improve scheduler fairness#9985
alin-at-dfinity wants to merge 16 commits intomasterfrom
alin/DSM-103-scheduler-proptest

alin-at-dfinity commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alin-at-dfinity commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alin-at-dfinity commented Apr 22, 2026 •

edited

Loading