8343704: Bad GC parallelism with processing Cleaner queues #22043

shipilev · 2024-11-12T16:00:39Z

See the bug for more discussion and reproducer. This PR replaces the ad-hoc linked list with segmented list of arrays. Arrays are easy targets for GC. There are possible improvements here, most glaring is parallelism that is currently knee-capped by global synchronization. The synchronization scheme follows what we have in original code, and I think it is safer to continue with it right now.

I'll put performance data in a separate comment.

Additional testing:

Original reproducer improves drastically
New microbenchmark shows no regression on "churning" tests, which covers insertion/removal perf
New microbenchmark shows improvement on Full GC times (crude, but repeatable), serves as a proxy for reproducer
java/lang/ref tests in release
all tests in fastdebug

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8343704: Bad GC parallelism with processing Cleaner queues (Enhancement - P4)

Reviewers

Brent Christian (@bchristi-git - Reviewer)
Oli Gillespie (@olivergillespie - Author) Review applies to c511c9bb
Kelvin Nilsen (@kdnilsen - Author) Review applies to 30c0c7c5
Viktor Klang (@viktorklang-ora - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22043/head:pull/22043
$ git checkout pull/22043

Update a local copy of the PR:
$ git checkout pull/22043
$ git pull https://git.openjdk.org/jdk.git pull/22043/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22043

View PR using the GUI difftool:
$ git pr show -t 22043

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22043.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2024-11-12T16:01:13Z

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-11-12T16:02:13Z

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8343704: Bad GC parallelism with processing Cleaner queues

Reviewed-by: bchristi, vklang, ogillespie, kdnilsen

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 2 new commits pushed to the master branch:

659f70b: 8343418: Unnecessary Hashtable usage in CSS.htmlAttrToCssAttrMap
5c8cb2e: 8337199: Add jcmd Thread.vthread_scheduler and Thread.vthread_pollers diagnostic commands

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

shipilev · 2024-11-12T16:03:14Z

Original reproducer on my M1 improves dramatically:

# Before
...
[8.989s][info ][gc       ] GC(50) Pause Young (Normal) (G1 Evacuation Pause) 608M->21M(1011M) 46.562ms
[9.187s][info ][gc       ] GC(51) Pause Young (Normal) (G1 Evacuation Pause) 608M->22M(1011M) 45.286ms
[9.387s][info ][gc       ] GC(52) Pause Young (Normal) (G1 Evacuation Pause) 609M->21M(1011M) 45.636ms
[9.592s][info ][gc       ] GC(53) Pause Young (Normal) (G1 Evacuation Pause) 608M->22M(1015M) 47.514ms
[9.794s][info ][gc       ] GC(54) Pause Young (Normal) (G1 Evacuation Pause) 612M->22M(1015M) 46.807ms
[9.993s][info ][gc       ] GC(55) Pause Young (Normal) (G1 Evacuation Pause) 612M->21M(1015M) 45.964ms

# After
...
[6.617s][info ][gc       ] GC(50) Pause Young (Normal) (G1 Evacuation Pause) 501M->20M(829M) 10.668ms
[6.758s][info ][gc       ] GC(51) Pause Young (Normal) (G1 Evacuation Pause) 501M->20M(829M) 10.712ms
[6.894s][info ][gc       ] GC(52) Pause Young (Normal) (G1 Evacuation Pause) 500M->20M(829M) 12.090ms
[7.031s][info ][gc       ] GC(53) Pause Young (Normal) (G1 Evacuation Pause) 501M->20M(829M) 11.026ms
[7.171s][info ][gc       ] GC(54) Pause Young (Normal) (G1 Evacuation Pause) 500M->20M(837M) 12.107ms

A closest reproducer in form of JMH test also improves in both GC times and noise. On 5950X:

Benchmark       (count)  Mode  Cnt    Score    Error  Units

# Before
CleanerGC.test    16384  avgt    9    2.602 ±  0.041  ms/op
CleanerGC.test    65536  avgt    9    4.815 ±  0.038  ms/op
CleanerGC.test   262144  avgt    9   12.569 ±  0.200  ms/op
CleanerGC.test  1048576  avgt    9   52.579 ±  1.745  ms/op
CleanerGC.test  4194304  avgt    9  209.140 ± 17.027  ms/op

# After
CleanerGC.test    16384  avgt    9    2.499 ±  0.209  ms/op
CleanerGC.test    65536  avgt    9    3.269 ±  0.185  ms/op
CleanerGC.test   262144  avgt    9    6.108 ±  0.140  ms/op
CleanerGC.test  1048576  avgt    9   18.621 ±  0.150  ms/op
CleanerGC.test  4194304  avgt    9   67.267 ±  1.413  ms/op

On churn benchmark, which covers insertion/removal perf, new implementation outperforms the original one:

Benchmark          (count)  (recipFreq)  Mode  Cnt   Score    Error  Units

# Before
CleanerChurn.test      N/A          128  avgt    9   7.063 ±  0.262  ns/op
CleanerChurn.test      N/A          256  avgt    9   5.669 ±  0.118  ns/op
CleanerChurn.test      N/A          512  avgt    9   5.025 ±  0.066  ns/op
CleanerChurn.test      N/A         1024  avgt    9   4.714 ±  0.086  ns/op
CleanerChurn.test      N/A         2048  avgt    9   4.595 ±  0.091  ns/op

# After
CleanerChurn.test      N/A          128  avgt    9   5.941 ± 0.133  ns/op
CleanerChurn.test      N/A          256  avgt    9   5.113 ± 0.091  ns/op
CleanerChurn.test      N/A          512  avgt    9   4.767 ± 0.109  ns/op
CleanerChurn.test      N/A         1024  avgt    9   4.551 ± 0.037  ns/op
CleanerChurn.test      N/A         2048  avgt    9   4.448 ± 0.103  ns/op

openjdk · 2024-11-12T16:03:16Z

@shipilev The following label will be automatically applied to this pull request:

core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2024-11-12T16:07:47Z

Webrevs

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java

src/java.base/share/classes/jdk/internal/ref/PhantomCleanable.java

test/micro/org/openjdk/bench/java/lang/ref/CleanerChurn.java

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java

xmas92 · 2024-11-14T13:28:46Z

This seem to handle excessive allocations when churning around an empty list by keeping the head node always allocated. I wonder if there is any worth adding some hysteresis if it churns around a multiple of the NODE_CAPACITY, by for example pooling one node.

shipilev · 2024-11-14T13:56:03Z

This seem to handle excessive allocations when churning around an empty list by keeping the head node always allocated.

Realistically, the list is almost never empty: there is a Cleaner instance itself recorded in the list. The only (?) state when the list is fully empty is when Cleaner itself is also dead, AFAICS. We pre-allocate head node for the implementation simplicity: if head is always available, we don't need to null-check it, for example.

I wonder if there is any worth adding some hysteresis if it churns around a multiple of the NODE_CAPACITY, by for example pooling one node.

I don't think we should care about this case: it seems the rare benefit does not outweigh the cost for common case? The goal for this implementation is to avoid wasting more space than necessary. Caching a node would take another bunch of KBs per Cleaner, at very least.

xmas92 · 2024-11-14T14:50:21Z

I don't think we should care about this case: it seems the rare benefit does not outweigh the cost for common case? The goal for this implementation is to avoid wasting more space than necessary. Caching a node would take another bunch of KBs per Cleaner, at very least.

That is probably correct. I was however thinking that it would only be pooled asymmetrically as some type of hystereses. So you pool when you remove a node (switch the head) and keep it far an arbitrary amount of removals. So it would only really waste memory for cleaners that have this behaviour that they keep adding and removing cleanable around a NODE_CAPACITY boundary.

shipilev · 2024-11-14T17:07:42Z

I was however thinking that it would only be pooled asymmetrically as some type of hystereses. So you pool when you remove a node (switch the head) and keep it far an arbitrary amount of removals. So it would only really waste memory for cleaners that have this behaviour that they keep adding and removing cleanable around a NODE_CAPACITY boundary.

I really do not want to make an improvement that is more complicated than it needs to be :) As it stands, current thing improves performance across the board, so chasing minor inefficiencies looks like venturing into diminishing returns territory. We can do this as the follow-up, if you want to explore that. I don't see clearly how this would work, and I would prefer to hold off more advanced heuristics in favor of simplicity here.

shipilev · 2024-11-14T17:26:31Z

That is probably correct. I was however thinking that it would only be pooled asymmetrically as some type of hystereses. So you pool when you remove a node (switch the head) and keep it far an arbitrary amount of removals. So it would only really waste memory for cleaners that have this behaviour that they keep adding and removing cleanable around a NODE_CAPACITY boundary.

I have been playing with 8343704-node-cache.txt -- is that what you had in mind?

xmas92 · 2024-11-15T07:32:37Z

That is probably correct. I was however thinking that it would only be pooled asymmetrically as some type of hystereses. So you pool when you remove a node (switch the head) and keep it far an arbitrary amount of removals. So it would only really waste memory for cleaners that have this behaviour that they keep adding and removing cleanable around a NODE_CAPACITY boundary.

I have been playing with 8343704-node-cache.txt -- is that what you had in mind?

Yes. It amortises the allocation over at least NODE_CAPACITY inserts, instead of 1 in the worst case.

I have very little experience how this plays out in practice, nor how the cleanable are used. This was purely an observation based on seen a symmetrical grow/shrink behaviour of some resource.

shipilev · 2024-11-15T10:02:30Z

I have been playing with 8343704-node-cache.txt -- is that what you had in mind?

Yes. It amortises the allocation over at least NODE_CAPACITY inserts, instead of 1 in the worst case.

Pushed that into PR. There is some measurable impact for dealing with this node cache during heavy churn, but it is still well within the improvements margin we get wholesale.

I am willing to accept this single-slot cache for three reasons: a) crossing the artificial border with just a single cleaner registering and unregistering looks like something we want to mitigate to avoid surprises; b) Cleaners are recommended by Javadoc to be shared, so wasting another node should not be as painful; c) the implementation and maintenance cost is not high.

But I would draw the line at this heuristics, and do nothing smarter :)

franz1981 · 2024-11-15T10:11:38Z

Not that it was the main problem here (since using array-linked lists deliver already the major improvement with less changes, kudos!) - but in case having a lock-free structure can be of any help, in JCTools we have
and https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/MpmcUnboundedXaddArrayQueue.java which is decently scalable offer side (less consumer-side) and can save reusing chuncks, becoming way simpler.
If can be of any interest I can create a gist with a simplified, non-Unsafe version.

Otherwise https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/queues/array/FAAArrayQueue.java it's as simple as it looks, but probably need to make it linearizable - IDK.

RogerRiggs · 2024-11-19T18:50:36Z

With SM removal, there is a doPrivileged call in Cleaner.java.
Should that be handled separately from this PR?

test/jdk/jdk/internal/ref/Cleaner/CleanableListTest.java

shipilev · 2024-11-19T19:26:03Z

With SM removal, there is a doPrivileged call in Cleaner.java. Should that be handled separately from this PR?

Since this is not related to the problem at hand, I'd prefer to keep it out of this PR.

shipilev · 2024-11-19T19:33:03Z

With SM removal, there is a doPrivileged call in Cleaner.java. Should that be handled separately from this PR?

Since this is not related to the problem at hand, I'd prefer to keep it out of this PR.

Actually, I wonder if you mean Cleaner.java that is not even affected by this patch? It is completely unrelated, and it would probably be gone with JDK-8344332.

RogerRiggs · 2024-11-19T19:50:23Z

Right, I meant jdk.internal.ref.Cleaner.
So Yes, it will be gone after 8344332; and it can remain as is until that's implemented.

src/java.base/share/classes/jdk/internal/ref/PhantomCleanable.java

test/jdk/jdk/internal/ref/Cleaner/CleanableListTest.java

test/micro/org/openjdk/bench/java/lang/ref/CleanerGC.java

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java

shipilev · 2024-12-03T10:47:41Z

Testing looks clean for me here. I would like to integrate this soon. If anyone wants to test it through their CIs, please do so :)

shipilev · 2024-12-04T11:34:36Z

There we go.

/integrate

openjdk · 2024-12-04T11:36:23Z

Going to push as commit 4000e92.
Since your change was applied there have been 49 commits pushed to the master branch:

0c7451a: 8332686: InetAddress.ofLiteral can throw StringIndexOutOfBoundsException
56d315d: 8343540: Report preview error for inherited effectively-preview methods
994504c: 8329351: Add runtime/Monitor/TestRecursiveLocking.java for recursive Java monitor stress testing
3d49665: 8345286: Remove use of SecurityManager API from misc areas
38927fc: 8343213: TEST_BUG: [Graal] java/lang/ref/Basic.java fails
cf1eb58: 8344935: [ubsan]: javaThread.hpp:1241:52: runtime error: load of value 9831830, which is not a valid value for type 'freeze_result'
943aa03: 8345404: [ppc64le] ProblemList TestAllocOutOfMemory.java#large
9e2b66f: 8345178: RISC-V: Add gtests for narrow cmpxchg
4c33caa: 8344609: Check ResourceMark nesting when allocating a GrowableArray on an alternative ResourceArea
4b92816: 8345375: Improve debuggability of test/jdk/java/net/Socket/CloseAvailable.java
... and 39 more: https://git.openjdk.org/jdk/compare/3eb54615783562f24e61983dfcc3cb823b27b0eb...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-12-04T11:36:32Z

@shipilev Pushed as commit 4000e92.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Fix

69d2d18

openjdk bot added the rfr Pull request is ready for review label Nov 12, 2024

openjdk bot added the core-libs [email protected] label Nov 12, 2024

bchristi-git suggested changes Nov 13, 2024

View reviewed changes

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java Outdated Show resolved Hide resolved

shipilev added 2 commits November 13, 2024 09:16

Touchups: assert index, polish commits

f791d3c

Merge branch 'master' into JDK-8343704-cleaner-gc

885edc1

graalvmbot mentioned this pull request Nov 13, 2024

[JDK-8344110] Adapt JDK-8343704: Bad GC parallelism with processing Cleaner queues oracle/graal#10084

Merged

Merge branch 'master' into JDK-8343704-cleaner-gc

0194c9b

kdnilsen reviewed Nov 13, 2024

View reviewed changes

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java Outdated Show resolved Hide resolved

src/java.base/share/classes/jdk/internal/ref/PhantomCleanable.java Show resolved Hide resolved

test/micro/org/openjdk/bench/java/lang/ref/CleanerChurn.java Show resolved Hide resolved

Review feedback: make sure trimming actually works, stylistic changes

d4a03d8

liach reviewed Nov 14, 2024

View reviewed changes

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java Outdated Show resolved Hide resolved

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java Outdated Show resolved Hide resolved

Reimplement with segmented linked list of arrays

cf3a93b

olivergillespie reviewed Nov 14, 2024

View reviewed changes

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java Outdated Show resolved Hide resolved

Avoid NPE on empty list, add tests, touchups

c511c9b

olivergillespie approved these changes Nov 14, 2024

View reviewed changes

Add the node cache

ae3c6c0

Do not need -ea -esa in tests, our testing infra adds them already

a7584bc

bchristi-git suggested changes Nov 19, 2024

View reviewed changes

test/jdk/jdk/internal/ref/Cleaner/CleanableListTest.java Outdated Show resolved Hide resolved

Use RandomFactory in test

8b53bc8

bchristi-git reviewed Nov 20, 2024

View reviewed changes

src/java.base/share/classes/jdk/internal/ref/PhantomCleanable.java Show resolved Hide resolved

bchristi-git suggested changes Nov 20, 2024

View reviewed changes

test/jdk/jdk/internal/ref/Cleaner/CleanableListTest.java Show resolved Hide resolved

Check all elements are removable after random test

63322b7

bchristi-git reviewed Nov 21, 2024

View reviewed changes

test/micro/org/openjdk/bench/java/lang/ref/CleanerGC.java Outdated Show resolved Hide resolved

test/micro/org/openjdk/bench/java/lang/ref/CleanerGC.java Show resolved Hide resolved

shipilev added 2 commits November 21, 2024 20:19

Improve CleanerGC benchmark

e6e0633

Merge branch 'master' into JDK-8343704-cleaner-gc

507401c

viktorklang-ora reviewed Nov 22, 2024

View reviewed changes

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java Show resolved Hide resolved

viktorklang-ora reviewed Nov 22, 2024

View reviewed changes

src/java.base/share/classes/jdk/internal/ref/CleanerImpl.java Outdated Show resolved Hide resolved

Locals for head.size

a3cffd2

bchristi-git approved these changes Dec 2, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Dec 2, 2024

viktorklang-ora approved these changes Dec 2, 2024

View reviewed changes

Merge branch 'master' into JDK-8343704-cleaner-gc

39184f9

openjdk bot added the integrated Pull request has been integrated label Dec 4, 2024

openjdk bot closed this Dec 4, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 4, 2024

graalvmbot mentioned this pull request Dec 9, 2024

[GR-60372] Update labsjdk to 24+27-jvmci-b01 oracle/graal#10269

Merged

shipilev deleted the JDK-8343704-cleaner-gc branch January 8, 2025 12:48

liach mentioned this pull request Jan 28, 2025

8348668: Prevent first resource cleanup in confined arena from escaping #23321

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8343704: Bad GC parallelism with processing Cleaner queues #22043

8343704: Bad GC parallelism with processing Cleaner queues #22043

shipilev commented Nov 12, 2024 •

edited by openjdk bot

Loading

bridgekeeper bot commented Nov 12, 2024

openjdk bot commented Nov 12, 2024 •

edited

Loading

shipilev commented Nov 12, 2024 •

edited

Loading

openjdk bot commented Nov 12, 2024

mlbridge bot commented Nov 12, 2024 •

edited

Loading

xmas92 commented Nov 14, 2024

shipilev commented Nov 14, 2024

xmas92 commented Nov 14, 2024 •

edited

Loading

shipilev commented Nov 14, 2024

shipilev commented Nov 14, 2024

xmas92 commented Nov 15, 2024

shipilev commented Nov 15, 2024

franz1981 commented Nov 15, 2024

RogerRiggs commented Nov 19, 2024

shipilev commented Nov 19, 2024

shipilev commented Nov 19, 2024

RogerRiggs commented Nov 19, 2024

shipilev commented Dec 3, 2024

shipilev commented Dec 4, 2024

openjdk bot commented Dec 4, 2024

openjdk bot commented Dec 4, 2024

8343704: Bad GC parallelism with processing Cleaner queues #22043

8343704: Bad GC parallelism with processing Cleaner queues #22043

Conversation

shipilev commented Nov 12, 2024 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented Nov 12, 2024

openjdk bot commented Nov 12, 2024 • edited Loading

shipilev commented Nov 12, 2024 • edited Loading

openjdk bot commented Nov 12, 2024

mlbridge bot commented Nov 12, 2024 • edited Loading

Webrevs

xmas92 commented Nov 14, 2024

shipilev commented Nov 14, 2024

xmas92 commented Nov 14, 2024 • edited Loading

shipilev commented Nov 14, 2024

shipilev commented Nov 14, 2024

xmas92 commented Nov 15, 2024

shipilev commented Nov 15, 2024

franz1981 commented Nov 15, 2024

RogerRiggs commented Nov 19, 2024

shipilev commented Nov 19, 2024

shipilev commented Nov 19, 2024

RogerRiggs commented Nov 19, 2024

shipilev commented Dec 3, 2024

shipilev commented Dec 4, 2024

openjdk bot commented Dec 4, 2024

openjdk bot commented Dec 4, 2024

shipilev commented Nov 12, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Nov 12, 2024 •

edited

Loading

shipilev commented Nov 12, 2024 •

edited

Loading

mlbridge bot commented Nov 12, 2024 •

edited

Loading

xmas92 commented Nov 14, 2024 •

edited

Loading