Use TimingWheel on Scheduler when possible#671
Conversation
Seggan
left a comment
There was a problem hiding this comment.
Have you actually profiled your changes?
(Also I have never heard of a timing wheel, gotta look that up now 🤔)
|
One more thing: I think moving all these schedulers into |
|
Also what happens if I schedule an insanely long delay, larger than the wheel? |
Nothing special, we are doing this private val mask = wheelSize - 1L
val slot = (task.executeTick and mask).toInt()which is equivalent to val slot = task.executeTick % wheelSizejust optimized for powers of two |
|
After some testing using this task generator: private val spawnTasks = buildCommand("spawn_tasks") {
argument("amount", IntegerArgumentType.integer()) {
executesWithPlayer { player ->
val amount = getArgument<Int>("amount")
Rebar.scope.launch {
repeat(amount) {
Rebar.scope.launch {
delayTicks(Random.nextLong(0, 5_000))
var x = 984568736853295L
while (true) {
delay(1.seconds)
if (x % 2 == 0L) {
x /= 2
} else {
x *= 3
x++
}
}
}
}
// notify to start the spark report
delayTicks(5_000)
player.sendMessage("All tasks have been launched")
}
}
}
}Spaning 1_000_000 tasks Using TimingWheel = https://spark.lucko.me/jERpFEFneJ PriorityQueue = https://spark.lucko.me/DWwA5VMbW8 |
There was a problem hiding this comment.
Am I reading the docs right that you can only schedule tasks for tick 2^exponent? A little confused as to how this works, does it actually behave any differently from a priority queue in terms of when it runs task? The docs seem to imply so or maybe I am reading them wrong?
There was a problem hiding this comment.
No, view it as an hashmap based on the execution tick
It runs and check the tasks for a specific tick so the tasks are ran the same for tickSpeed = 1, otherwise it would make some buckets useless and it could break if tasks fall in said buckets
There was a problem hiding this comment.
for example if it does tickspeed 2 it would skip all the even position buckets, and if you run a delay and falls into said positions it would break said task
|
I do have to ask - it's a new data structure, a scheduler abstraction, more indirection, more potential for bugs - are we really sure this is worth adding? I get the impression the reduction only becomes significant with a lot of tasks, to the point where the vast majority of the time would be spent on the tasks themselves as opposed to the scheduling. Is this the case? Could we maybe get some more info on the profiles and the case for this? |
there is a comment with all the spark profilers already if you read up |
|
I know, sorry should have been more specific
|
1mil simple math operations running every tick with random delay, what kind of testing do you need me to perform? |
|
Just want to see the performance gap at say 1000 tasks with a slightly heavier workload like idk multiplying 1000 numbers and printing them to console or something like that |
|
^ but do something even heavier like log or sqrt |
|
And make sure you're using random numbers so the JVM can't optimize |
|
1000 tasks running 1000 complex operations show that very little difference, the massive difference is when there are a lot of tasks running any way here |
|
I don't think this is really worth the added maintenance / potential bugs / complexity cost then tbh as we will almost certainly not be running millions of tasks unless something has gone very very wrong. anyone have any other thoughts? |
|
I had the opposite idea. This code does not affect low-task runtime and keeps the server from blowing up on high amounts of tasks (which may actually happen legitimately in code where coroutine context switched happens often, for example Prospecting generated ores off thread and then used context switching to put them in the world), and the class itself is rather simple. Although I do agree the power-based indexing is confusing, the rest of the class I understand completely. If we were to switch it out for mod-based indexing it'd be a pretty trivial class. |
(I forgot about this PR ngl) Surely if you're running enough tasks that a significant portion of the server time is spent on operations on a min heap (!), I highly doubt the server is going to be doing anything other than crashing in the next minute anyway. I pretty much stick with the same arguments I think, it seems overkill. (Also side note if this does end up getting merged we need docs on the two schedulers to explain why there are two and when to use each one) |
I can make the docs no problem, and overkill is still good if there is actually a necessary usecase for someone out there, the TimingWheel is pretty easy to understand after a bit of research, so it is not really an overkill. The only funnly complex stuff all of you guys saw in this PR is the fact that you can use AND as a MOD expression for powers of 2 which was hilarious xD Btw the PR i made to paper is probably never getting merged and probably dead, this is because it is not realistically possible to get that many tasks in the BukkitScheduler due to massive overhead from a lot of stuff. With 8GB I could not go above 100k tasks or i would get OOM, here with 1mil+ tasks and this optimization PR it is just chilling |
100K is pretty well within the realm of possibility, given that each block and entitiy have their own task, plus other (many) tasks. IMO this (especially if changed to mod) is a very much worthwhile change. |
|
One of the main points of the original scheduling issue was to tick all blocks together for the potential cache coherency benefits, reduced scheduling overhead for tons of tiny tasks, and so that how long it takes to tick blocks could be profiled easily without adding tons of overhead. With that, there would only be one task per block type (which ticks all the blocks of one type). I think that if we're expecting 100k+ tasks we should really be asking why it isn't just a for loop As for each block spawning its own tasks, surely tasks launched in Java will not get this performance benefit so why are we trying to fix something specifically for Kotlin code and ignoring the implications for Java? Like this is one of the main issues brought up in the original scheduling issue and the fact that it's only fixed for one language is really not good. I'm happy to merge this if others think it's a good idea, I'm probably not changing my mind on it tbh. I'm just frustrated because to me it feels like the original issue has not been addressed at all by the coroutine PR and now we're slapping this on top of it as a bandaid without addressing most of the underlying issues because 'coroutines cool' (Sorry just woke up at like 5am, couldn't get to sleep, and saw this comment so thought I'd reply now lol) |
I was referring mostly to tickers which areanaged by Kotlin |
Is that not what this PR reduces even further?? |
Ye, but my point still stands, this cannot be taken advantage of in Java, if someone wants to schedule any large number of tasks in Java then we need another solution for that because we have only replaced the bukkit scheduler for coroutines. Like surely there's an easy way to implement this in a way that we can use in Java right? At the end of the day it is literally just lambdas being executed at a fixed tick interval, we basically don't take need coroutine-specific features anyway and it is not hard to write a very simple API for this
Yes, but so would the original recommendations made in the issue (#334) and I don't understand why we're kinda just brute forcing it with coroutines and then implementing this PR so that we can spawn even more coroutines. Like I'm mainly just confused what the motivation behind the coroutine spam is since it solves essentially zero of the problems described in the issue, i.e. doesn't help anyone scheduling anything in Java, doesn't allow us to profile block ticking times efficiently, doesn't allow tasks related to a block to be grouped together, doesn't allow tasks to be scheduled in Java tied to a block/chunk, and does away with cache locality benefits entirely |
|
TL;DR I think we just see this issue differently, input from other @pylonmc/devs would be appreciated |
|
Wait I think there's a misunderstanding - I'm not against coroutines at all per se, I'm just against spamming them especially in the context of blocks for the reasons I described above, and I'm against implementing loads of coroutine-specific stuff because Java won't be able to take advantage of it (unless we add an api like you suggest)
Yeah fair, but now we need more of these for dispatching with a chunk scope or entity scope or whatever, overloads for dispatching with an initial delay or whatever, etc. It's not a big deal, but I'm just not thrilled at the idea of having completely different interfaces for Kotlin and Java for doing functionally the same thing, having to maintain both, and document and explain both without confusing people too much. Like I just think it would be nice to have a unified interface
That's not what I'm referring to - if you have a bunch of tasks executing in essentially random order, the cache will be thrashed a lot more than if you group a bunch of one type of task together and execute them all at once, since tasks of that type will tend to access the same things over and over |


No description provided.