Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-60079] JFR leak profiler span fixes #10154

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

roberttoyonaga
Copy link
Collaborator

Summary

Back when the original PR was integrated, we noted that there was probably a problem with how each sample's span was allotted, but we decided to match exactly what Hotspot was doing anyway. We were previously setting sample span to be the object allocation size. This is wrong because "span" is meant to represent all the allocations that span a period of time (even the ones that got removed from the queue due to GC). This is meant to have the effect of creating an even sampling representation of allocations over time, ex. if a sample's neighbors are removed from the list it should absorb their span and increase in importance so that its less likely to be removed itself.

Now that the fix is integrated in Hotspot (openjdk/jdk#19334), we should also add the fix in SubstrateVM.

I also fixed how totalInQueue is updated. Previously we weren't actually saving its updated size after adding samples to the queue.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Nov 25, 2024
@roberttoyonaga
Copy link
Collaborator Author

Hi @galderz, just mentioning you so it's on your radar.

@christianhaeubl christianhaeubl changed the title JFR leak profiler span fixes [GR-60079] JFR leak profiler span fixes Nov 26, 2024
@christianhaeubl
Copy link
Member

Thanks for the fixes, this will be merged in the next few days.

@graalvmbot graalvmbot merged commit 095f62e into oracle:master Nov 27, 2024
13 checks passed
@christianhaeubl
Copy link
Member

@roberttoyonaga : this PR causes the following assertion failure in some multi-threaded test cases:

  A  SP 0x000071c8495ffc70 IP 0x00005c9145c4a2b3 size=48    com.oracle.svm.core.jfr.oldobject.JfrOldObjectSampler.remove(JfrOldObjectSampler.java:125)
  A  SP 0x000071c8495ffca0 IP 0x00005c9145c4a4e6 size=48    com.oracle.svm.core.jfr.oldobject.JfrOldObjectSampler.scavenge(JfrOldObjectSampler.java:103)
  A  SP 0x000071c8495ffcd0 IP 0x00005c9145c4a385 size=64    com.oracle.svm.core.jfr.oldobject.JfrOldObjectSampler.sample(JfrOldObjectSampler.java:81)
  i  SP 0x000071c8495ffd10 IP 0x00005c9145c488b9 size=16    com.oracle.svm.core.jfr.oldobject.JfrOldObjectProfiler.sample0(JfrOldObjectProfiler.java:93)
  A  SP 0x000071c8495ffd10 IP 0x00005c9145c488b9 size=16    com.oracle.svm.core.jfr.oldobject.JfrOldObjectProfiler.sample(JfrOldObjectProfiler.java:78)
  i  SP 0x000071c8495ffd20 IP 0x00005c9145ba2d5b size=48    com.oracle.svm.core.genscavenge.ThreadLocalAllocation.sampleSlowPathAllocation(ThreadLocalAllocation.java:552)
  A  SP 0x000071c8495ffd20 IP 0x00005c9145ba2d5b size=48    com.oracle.svm.core.genscavenge.ThreadLocalAllocation.slowPathNewInstance(ThreadLocalAllocation.java:230)

Not directly related to this PR but I am also seeing assertion errors if I only run a subset of the tests, e.g., via mx native-unittest TestOldObjectProfiler:

com.oracle.svm.test.jfr.oldobject.TestOldObjectProfiler#testEvictYoungest
java.lang.AssertionError
	at org.junit.Assert.fail(Assert.java:87)
	at org.junit.Assert.assertTrue(Assert.java:42)
	at org.junit.Assert.assertTrue(Assert.java:53)
	at com.oracle.svm.test.jfr.oldobject.TestOldObjectProfiler.validate(TestOldObjectProfiler.java:217)
	at com.oracle.svm.test.jfr.oldobject.TestOldObjectProfiler.testEvictYoungest(TestOldObjectProfiler.java:120)

Can you look into these failures?

@roberttoyonaga
Copy link
Collaborator Author

Hi @christianhaeubl, yes ok I'll investigate what is happening there

@roberttoyonaga
Copy link
Collaborator Author

roberttoyonaga commented Nov 28, 2024

@christianhaeubl

I've created PR fixing the issues you identified: #10190

The reason for the assertion failure is an incorrect assumption made by the assertion. The span fix in this PR causes the error to manifest, it was probably hidden before because span was allotted incorrectly.

When running the TestOldObjectProfiler test in isolation, this assertion fails TestOldObjectProfiler.validate(TestOldObjectProfiler.java:217) when it tires to verify the sample's JfrTick allocation time. This only fails when tested in isolation because JfrTicks has not yet been initialized. When running all the tests together with mx native-unittest JFR gets initialized in a prior test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
native-image native-image-jfr OCA Verified All contributors have signed the Oracle Contributor Agreement. redhat-interest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants