Skip to content

Use TimingWheel on Scheduler when possible#671

Open
Intybyte wants to merge 10 commits into
masterfrom
vaan/opt/scheduler
Open

Use TimingWheel on Scheduler when possible#671
Intybyte wants to merge 10 commits into
masterfrom
vaan/opt/scheduler

Conversation

@Intybyte
Copy link
Copy Markdown
Member

No description provided.

@Intybyte Intybyte requested a review from Seggan March 13, 2026 10:59
Copy link
Copy Markdown
Member

@Seggan Seggan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you actually profiled your changes?

(Also I have never heard of a timing wheel, gotta look that up now 🤔)

Comment thread rebar/src/main/kotlin/io/github/pylonmc/rebar/collections/tasks/Scheduler.kt Outdated
Comment thread rebar/src/main/kotlin/io/github/pylonmc/rebar/collections/tasks/Scheduler.kt Outdated
Comment thread rebar/src/main/kotlin/io/github/pylonmc/rebar/async/BukkitMainThreadDispatcher.kt Outdated
@Seggan
Copy link
Copy Markdown
Member

Seggan commented Mar 13, 2026

One more thing: I think moving all these schedulers into async would be fine

@Seggan
Copy link
Copy Markdown
Member

Seggan commented Mar 13, 2026

Also what happens if I schedule an insanely long delay, larger than the wheel?

@Intybyte
Copy link
Copy Markdown
Member Author

Also what happens if I schedule an insanely long delay, larger than the wheel?

Nothing special, we are doing this

private val mask = wheelSize - 1L
val slot = (task.executeTick and mask).toInt()

which is equivalent to

val slot = task.executeTick % wheelSize

just optimized for powers of two

@Intybyte
Copy link
Copy Markdown
Member Author

Intybyte commented Mar 13, 2026

After some testing using this task generator:

private val spawnTasks = buildCommand("spawn_tasks") {
    argument("amount", IntegerArgumentType.integer()) {
        executesWithPlayer { player ->
            val amount = getArgument<Int>("amount")

            Rebar.scope.launch {
                repeat(amount) {
                    Rebar.scope.launch {
                        delayTicks(Random.nextLong(0, 5_000))

                        var x = 984568736853295L
                        while (true) {
                            delay(1.seconds)
                            if (x % 2 == 0L) {
                                x /= 2
                            } else {
                                x *= 3
                                x++
                            }
                        }
                    }
                }

                // notify to start the spark report
                delayTicks(5_000)
                player.sendMessage("All tasks have been launched")
            }
        }
    }
}

Spaning 1_000_000 tasks

Using /spark profiler start --thread * --timeout 60 for all tests

TimingWheel = https://spark.lucko.me/jERpFEFneJ
43.52% of which 38.57% is from tasks and 4.9% is from getValid
From the 38.57%, rescheduling the tasks accounts for 1.88%

PriorityQueue = https://spark.lucko.me/DWwA5VMbW8
84.48% of which 40.49% is from tasks and 43.78% is from getValid
From the 40.49%, rescheduling the tasks accounts for 0.08%

@Intybyte Intybyte requested a review from Seggan March 13, 2026 16:09
Copy link
Copy Markdown
Member

@Seggan Seggan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

Comment thread rebar/src/main/kotlin/io/github/pylonmc/rebar/async/schedulers/TimingWheel.kt Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reading the docs right that you can only schedule tasks for tick 2^exponent? A little confused as to how this works, does it actually behave any differently from a priority queue in terms of when it runs task? The docs seem to imply so or maybe I am reading them wrong?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, view it as an hashmap based on the execution tick

It runs and check the tasks for a specific tick so the tasks are ran the same for tickSpeed = 1, otherwise it would make some buckets useless and it could break if tasks fall in said buckets

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example if it does tickspeed 2 it would skip all the even position buckets, and if you run a delay and falls into said positions it would break said task

Comment thread rebar/src/main/kotlin/io/github/pylonmc/rebar/async/ScheduledTask.kt Outdated
Comment thread rebar/src/main/kotlin/io/github/pylonmc/rebar/async/ScheduledTask.kt Outdated
@LordIdra
Copy link
Copy Markdown
Contributor

I do have to ask - it's a new data structure, a scheduler abstraction, more indirection, more potential for bugs - are we really sure this is worth adding? I get the impression the reduction only becomes significant with a lot of tasks, to the point where the vast majority of the time would be spent on the tasks themselves as opposed to the scheduling. Is this the case? Could we maybe get some more info on the profiles and the case for this?

@Intybyte
Copy link
Copy Markdown
Member Author

I do have to ask - it's a new data structure, a scheduler abstraction, more indirection, more potential for bugs - are we really sure this is worth adding? I get the impression the reduction only becomes significant with a lot of tasks, to the point where the vast majority of the time would be spent on the tasks themselves as opposed to the scheduling. Is this the case? Could we maybe get some more info on the profiles and the case for this?

there is a comment with all the spark profilers already if you read up

@LordIdra
Copy link
Copy Markdown
Contributor

I know, sorry should have been more specific

  1. what is the actual workload that was run for those profiles?
  2. how many tasks were run? Obviously if there are like a million tasks running that's not realistic. I am wondering what the profiles are for a lower number of tasks

@Intybyte
Copy link
Copy Markdown
Member Author

I know, sorry should have been more specific

  1. what is the actual workload that was run for those profiles?
  2. how many tasks were run? Obviously if there are like a million tasks running that's not realistic. I am wondering what the profiles are for a lower number of tasks

1mil simple math operations running every tick with random delay, what kind of testing do you need me to perform?

@LordIdra
Copy link
Copy Markdown
Contributor

Just want to see the performance gap at say 1000 tasks with a slightly heavier workload like idk multiplying 1000 numbers and printing them to console or something like that

@Seggan
Copy link
Copy Markdown
Member

Seggan commented Mar 18, 2026

^ but do something even heavier like log or sqrt

@Seggan
Copy link
Copy Markdown
Member

Seggan commented Mar 18, 2026

And make sure you're using random numbers so the JVM can't optimize

@Intybyte
Copy link
Copy Markdown
Member Author

Intybyte commented Mar 20, 2026

1000 tasks running 1000 complex operations show that very little difference, the massive difference is when there are a lot of tasks running

any way here
https://spark.lucko.me/Nx0TYdTqdC
https://spark.lucko.me/1C9HzxOSWH

@LordIdra
Copy link
Copy Markdown
Contributor

LordIdra commented Mar 22, 2026

I don't think this is really worth the added maintenance / potential bugs / complexity cost then tbh as we will almost certainly not be running millions of tasks unless something has gone very very wrong. anyone have any other thoughts?

@Seggan
Copy link
Copy Markdown
Member

Seggan commented Mar 24, 2026

I had the opposite idea. This code does not affect low-task runtime and keeps the server from blowing up on high amounts of tasks (which may actually happen legitimately in code where coroutine context switched happens often, for example Prospecting generated ores off thread and then used context switching to put them in the world), and the class itself is rather simple. Although I do agree the power-based indexing is confusing, the rest of the class I understand completely. If we were to switch it out for mod-based indexing it'd be a pretty trivial class.

@LordIdra
Copy link
Copy Markdown
Contributor

LordIdra commented Apr 10, 2026

I had the opposite idea. This code does not affect low-task runtime and keeps the server from blowing up on high amounts of tasks (which may actually happen legitimately in code where coroutine context switched happens often, for example Prospecting generated ores off thread and then used context switching to put them in the world), and the class itself is rather simple. Although I do agree the power-based indexing is confusing, the rest of the class I understand completely. If we were to switch it out for mod-based indexing it'd be a pretty trivial class.

(I forgot about this PR ngl)

Surely if you're running enough tasks that a significant portion of the server time is spent on operations on a min heap (!), I highly doubt the server is going to be doing anything other than crashing in the next minute anyway. I pretty much stick with the same arguments I think, it seems overkill.

(Also side note if this does end up getting merged we need docs on the two schedulers to explain why there are two and when to use each one)

@Intybyte
Copy link
Copy Markdown
Member Author

I had the opposite idea. This code does not affect low-task runtime and keeps the server from blowing up on high amounts of tasks (which may actually happen legitimately in code where coroutine context switched happens often, for example Prospecting generated ores off thread and then used context switching to put them in the world), and the class itself is rather simple. Although I do agree the power-based indexing is confusing, the rest of the class I understand completely. If we were to switch it out for mod-based indexing it'd be a pretty trivial class.

(I forgot about this PR ngl)

Surely if you're running enough tasks that a significant portion of the server time is spent on operations on a min heap (!), I highly doubt the server is going to be doing anything other than crashing in the next minute anyway. I pretty much stick with the same arguments I think, it seems overkill.

(Also side note if this does end up getting merged we need docs on the two schedulers to explain why there are two and when to use each one)

I can make the docs no problem, and overkill is still good if there is actually a necessary usecase for someone out there, the TimingWheel is pretty easy to understand after a bit of research, so it is not really an overkill.

The only funnly complex stuff all of you guys saw in this PR is the fact that you can use AND as a MOD expression for powers of 2 which was hilarious xD

Btw the PR i made to paper is probably never getting merged and probably dead, this is because it is not realistically possible to get that many tasks in the BukkitScheduler due to massive overhead from a lot of stuff.

With 8GB I could not go above 100k tasks or i would get OOM, here with 1mil+ tasks and this optimization PR it is just chilling

@Seggan
Copy link
Copy Markdown
Member

Seggan commented May 7, 2026

With 8GB I could not go above 100k tasks or i would get OOM, here with 1mil+ tasks and this optimization PR it is just chilling

100K is pretty well within the realm of possibility, given that each block and entitiy have their own task, plus other (many) tasks. IMO this (especially if changed to mod) is a very much worthwhile change.

@LordIdra
Copy link
Copy Markdown
Contributor

LordIdra commented May 7, 2026

One of the main points of the original scheduling issue was to tick all blocks together for the potential cache coherency benefits, reduced scheduling overhead for tons of tiny tasks, and so that how long it takes to tick blocks could be profiled easily without adding tons of overhead. With that, there would only be one task per block type (which ticks all the blocks of one type). I think that if we're expecting 100k+ tasks we should really be asking why it isn't just a for loop

As for each block spawning its own tasks, surely tasks launched in Java will not get this performance benefit so why are we trying to fix something specifically for Kotlin code and ignoring the implications for Java? Like this is one of the main issues brought up in the original scheduling issue and the fact that it's only fixed for one language is really not good.

I'm happy to merge this if others think it's a good idea, I'm probably not changing my mind on it tbh. I'm just frustrated because to me it feels like the original issue has not been addressed at all by the coroutine PR and now we're slapping this on top of it as a bandaid without addressing most of the underlying issues because 'coroutines cool'

(Sorry just woke up at like 5am, couldn't get to sleep, and saw this comment so thought I'd reply now lol)

@Seggan
Copy link
Copy Markdown
Member

Seggan commented May 7, 2026

As for each block spawning its own tasks, surely tasks launched in Java will not get this performance benefi

I was referring mostly to tickers which areanaged by Kotlin

@Seggan
Copy link
Copy Markdown
Member

Seggan commented May 7, 2026

reduced scheduling overhead for tons of tiny tasks

Is that not what this PR reduces even further??

@LordIdra
Copy link
Copy Markdown
Contributor

LordIdra commented May 7, 2026

I was referring mostly to tickers which areanaged by Kotlin

Ye, but my point still stands, this cannot be taken advantage of in Java, if someone wants to schedule any large number of tasks in Java then we need another solution for that because we have only replaced the bukkit scheduler for coroutines. Like surely there's an easy way to implement this in a way that we can use in Java right? At the end of the day it is literally just lambdas being executed at a fixed tick interval, we basically don't take need coroutine-specific features anyway and it is not hard to write a very simple API for this

Is that not what this PR reduces even further??

Yes, but so would the original recommendations made in the issue (#334) and I don't understand why we're kinda just brute forcing it with coroutines and then implementing this PR so that we can spawn even more coroutines. Like I'm mainly just confused what the motivation behind the coroutine spam is since it solves essentially zero of the problems described in the issue, i.e. doesn't help anyone scheduling anything in Java, doesn't allow us to profile block ticking times efficiently, doesn't allow tasks related to a block to be grouped together, doesn't allow tasks to be scheduled in Java tied to a block/chunk, and does away with cache locality benefits entirely

@Seggan
Copy link
Copy Markdown
Member

Seggan commented May 7, 2026

At the end of the day it is literally just lambdas being executed at a fixed tick interval, we basically don't take need coroutine-specific features anyway and it is not hard to write a very simple API for this

Yeah that's pretty trivial:

image

Like I'm mainly just confused what the motivation behind the coroutine spam is since it solves essentially zero of the problems described in the issue

ig I don't see it as a coroutine problem, since to me coroutines are just a way to write flattened lambda trees, ie the dispatcher is pretty much the entirety of the code in question and we'd have to write a similar mechanism on top of bukkit tasks anyway, even without corotuines

doesn't help anyone scheduling anything in Java

Just demonstrated it can

doesn't allow us to profile block ticking times efficiently

It can (I just didn't implement it at that time)

doesn't allow tasks related to a block to be grouped together

It already does

doesn't allow tasks to be scheduled in Java tied to a block/chunk

It can with a trivial modification similar to above

does away with cache locality benefits entirely

Ok yeah that's a fair point, though I doubt you'd be getting much cache locality from any solution using function pointers

@Seggan
Copy link
Copy Markdown
Member

Seggan commented May 7, 2026

TL;DR I think we just see this issue differently, input from other @pylonmc/devs would be appreciated

@LordIdra
Copy link
Copy Markdown
Contributor

LordIdra commented May 7, 2026

Wait I think there's a misunderstanding - I'm not against coroutines at all per se, I'm just against spamming them especially in the context of blocks for the reasons I described above, and I'm against implementing loads of coroutine-specific stuff because Java won't be able to take advantage of it (unless we add an api like you suggest)

Yeah that's pretty trivial:

Yeah fair, but now we need more of these for dispatching with a chunk scope or entity scope or whatever, overloads for dispatching with an initial delay or whatever, etc. It's not a big deal, but I'm just not thrilled at the idea of having completely different interfaces for Kotlin and Java for doing functionally the same thing, having to maintain both, and document and explain both without confusing people too much. Like I just think it would be nice to have a unified interface

Ok yeah that's a fair point, though I doubt you'd be getting much cache locality from any solution using function pointers

That's not what I'm referring to - if you have a bunch of tasks executing in essentially random order, the cache will be thrashed a lot more than if you group a bunch of one type of task together and execute them all at once, since tasks of that type will tend to access the same things over and over

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants