-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple heuristic to dynamically adjust number of GC threads #51061
Conversation
9f61076
to
34d858c
Compare
(The idea implemented in this PR is inspired by https://dl.acm.org/doi/pdf/10.1145/2926697.2926706). |
If I recall correctly we didn't do this originally since the amount of work at the beginning is not necessarily for the amount discovered during the marking. If I read this PR correctly the thread sits there during GC and spins constantly checking the amount of work present? I think https://dl.acm.org/doi/pdf/10.1145/2926697.2926706 had one thread that was responsible for waking other thread up, instead of a bunch of threads sitting there spinning? |
This is true. And that's why the threads check if there is enough work at every attempt of entering the mark-loop (and not only in the beginning).
This PR intentionally deviates from the implementation in which a single thread is responsible for counting the available work and waking up other threads. Delegating the wake-ups to a single thread could cause a lot of performance degradation depending on how this thread is scheduled by the OS. In the worst-case, if the thread is not scheduled at all we would have no parallelization in the mark-loop. The solution in this PR is a bit more "de-centralized" in that sense. |
Not really, if it finds enough work, then it goes ahead and enters the mark-loop. |
b2fc6c9
to
a780003
Compare
The paper specifically mentions that they saw better results with wait/notify instead of spinning/sleeping. Did you experiment a bit? |
FWIW this also seems to address the regression from #51044:
|
This seems good to go, or? @gbaraldi merge if ok? |
LGTM! |
a780003
to
f5cf745
Compare
fb23477
to
cf862e4
Compare
Latest commit should follow the paper a lot more closely now. |
Can you update the table with the new results? |
ffe771f
to
77b65cb
Compare
77b65cb
to
2e7fe2f
Compare
b062623
to
3473191
Compare
Re-ran the results on a 36-core machine. Note that for up to 7 GC threads we're using the exponential backoff scheduler, and beyond that we use the spin-master one. Seems like spin-master is an improvement for mark-times on a large number of threads:
For the linked list benchmark, spin-master seems to avoid the negative scaling on a large number of GC threads as well:
|
3473191
to
bee6621
Compare
We've backported this PR and ran a few tests internally. We're seeing a segfault that doesn't seem to be reproducible in the open-source benchmarks. Marking this PR as draft until further investigated. |
Basically, every GC thread will first look at other workers' queues and count the amount of available work in order to decide whether it's worth it to start marking.
Seems to fix, on my machine, a negative scaling I was seeing on a GCBenchmark which exposes very little parallelism (e.g.
list.jl
).Also, doesn't seem to sacrifice scaling on a few GCBenchmarks which expose a lot of parallelism (e.g.
binary_tree
ones):