-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8343704: Bad GC parallelism with processing Cleaner queues #22043
Conversation
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
@shipilev This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 2 new commits pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
Original reproducer on my M1 improves dramatically:
A closest reproducer in form of JMH test also improves in both GC times and noise. On 5950X:
On churn benchmark, which covers insertion/removal perf, new implementation outperforms the original one:
|
Webrevs
|
This seem to handle excessive allocations when churning around an empty list by keeping the head node always allocated. I wonder if there is any worth adding some hysteresis if it churns around a multiple of the |
Realistically, the list is almost never empty: there is a
I don't think we should care about this case: it seems the rare benefit does not outweigh the cost for common case? The goal for this implementation is to avoid wasting more space than necessary. Caching a node would take another bunch of KBs per Cleaner, at very least. |
That is probably correct. I was however thinking that it would only be pooled asymmetrically as some type of hystereses. So you pool when you remove a node (switch the head) and keep it far an arbitrary amount of removals. So it would only really waste memory for cleaners that have this behaviour that they keep adding and removing cleanable around a NODE_CAPACITY boundary. |
I really do not want to make an improvement that is more complicated than it needs to be :) As it stands, current thing improves performance across the board, so chasing minor inefficiencies looks like venturing into diminishing returns territory. We can do this as the follow-up, if you want to explore that. I don't see clearly how this would work, and I would prefer to hold off more advanced heuristics in favor of simplicity here. |
I have been playing with 8343704-node-cache.txt -- is that what you had in mind? |
Yes. It amortises the allocation over at least NODE_CAPACITY inserts, instead of 1 in the worst case. I have very little experience how this plays out in practice, nor how the cleanable are used. This was purely an observation based on seen a symmetrical grow/shrink behaviour of some resource. |
Pushed that into PR. There is some measurable impact for dealing with this node cache during heavy churn, but it is still well within the improvements margin we get wholesale. I am willing to accept this single-slot cache for three reasons: a) crossing the artificial border with just a single cleaner registering and unregistering looks like something we want to mitigate to avoid surprises; b) Cleaners are recommended by Javadoc to be shared, so wasting another node should not be as painful; c) the implementation and maintenance cost is not high. But I would draw the line at this heuristics, and do nothing smarter :) |
Not that it was the main problem here (since using array-linked lists deliver already the major improvement with less changes, kudos!) - but in case having a lock-free structure can be of any help, in JCTools we have Otherwise https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/queues/array/FAAArrayQueue.java it's as simple as it looks, but probably need to make it linearizable - IDK. |
With SM removal, there is a doPrivileged call in Cleaner.java. |
Since this is not related to the problem at hand, I'd prefer to keep it out of this PR. |
Actually, I wonder if you mean |
Right, I meant jdk.internal.ref.Cleaner. |
Testing looks clean for me here. I would like to integrate this soon. If anyone wants to test it through their CIs, please do so :) |
There we go. /integrate |
Going to push as commit 4000e92.
Your commit was automatically rebased without conflicts. |
See the bug for more discussion and reproducer. This PR replaces the ad-hoc linked list with segmented list of arrays. Arrays are easy targets for GC. There are possible improvements here, most glaring is parallelism that is currently knee-capped by global synchronization. The synchronization scheme follows what we have in original code, and I think it is safer to continue with it right now.
I'll put performance data in a separate comment.
Additional testing:
java/lang/ref
tests in releaseall
tests in fastdebugProgress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22043/head:pull/22043
$ git checkout pull/22043
Update a local copy of the PR:
$ git checkout pull/22043
$ git pull https://git.openjdk.org/jdk.git pull/22043/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 22043
View PR using the GUI difftool:
$ git pr show -t 22043
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22043.diff
Using Webrev
Link to Webrev Comment