-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jdknext AIX: ** ASSERTION FAILED ** at CompositeCache.cpp:2437: #9997
Comments
fyi @hangshao0 - this looks related to your recent SCC changes. |
Seems that the core files are not saved/uploaded by the job. |
@hangshao0 there is another one from OpenJ9 https://ci.eclipse.org/openj9/job/Build_JDKnext_ppc64_aix_Personal/41/ with a diagnostic download. |
Started jdk8 and 11 builds, I'll likely revert the change if these are affected, unless there is a quick fix. https://ci.eclipse.org/openj9/job/Pipeline-Build-Test-All/950/ - failed due to #9992 (comment) |
Note: When possible, please download cores from these builds now (or even better at the moment of reporting the issue) and attach to issue, or linking to a location where core can be found, as those Jenkins links that are being used to report issues have a lifespan of 5 days. Also include the Rerun in Grinder link reported in issues, for the convenience of people starting to help with this issue many days later. #9835 another example of not enough long-lived info in the issue report which will cost time and resources to try and grind and reproduce. |
@lumpfish fyi the previous comment. |
@smlambert agree we need as much as possible, but it takes an aweful lot of time every morning triaging, took me 2 hours this morning => adoptium/temurin-build#1634 (comment) |
@adamfarley we need some innovation here please?! How about a "button" that "locks" the given test job so it doesn't get deleted? or maybe moves all "artifacts" to some persistent storage elsewhere...? |
Like maybe just click "Keep this build forever" ?! |
Found in the trace point:
So enterReadMutex failed once: But J9_PRIVATE_FLAGS2_IN_SHARED_CACHE_READ_MUTEX is still set inside enterReadMutex() even it returns -1. Next time this thread enter the read mutex again, the flag is found to be set and we fail on assertion. We should check the value of rc before setting J9_PRIVATE_FLAGS2_IN_SHARED_CACHE_READ_MUTEX. |
Comment in the code (https://github.com/eclipse/openj9/blob/master/runtime/shared_common/OSCachemmap.cpp#L748to#L764) suggests there is another thread holding RW mutex in the case of message "SH_OSCachemmap::acquireWriteLock EDEADLK : Case 3: Current thread owns W mon, but EDEADLK'd on W lock". However, I don't think that comment explains what is happening here. If there is another thread in this JVM holding the RW mutex, we should see the Case 3 message only once. As we will retry with Also I let the JVM crash if we failed to acquire the write mutex. The Case 3 message is still there (multiple times), but no threads in this JVM own the RW mutex.
Something else is going on. |
There is an Attach API thread locking some file under /tmp/.com_ibm_tools_attach/, which caused Here is what's happening, which gives JVM1: JVM2: This is an existing behaviour, not something introduced recently. Not sure if we can do better in this case. I guess our current behaviour that lets |
related to: #9997 (comment), Understand the time pressure and appreciate the concern @andrew-m-leonard. Perhaps it is better to triage less number of things but in a more thorough manner, so the issues can be closed (and not keep showing up to be triaged). As it is now, if we do not include enough info we are merely shifting the work/effort to the next person picking up the issue (which happens at a later time, so can become an impossible task for the next person in the chain). We're adding functionality to the "Create new issue" button in TRSS that will automatically include all links (and git diffs, and java -version info, rerun links, artifactory links if present and first occurrence see adoptium/aqa-test-tools#258) It would be useful to have someone record a typical nightly build & test triage session to understand how it is currently done, so we can improve the process for those who do it. |
Fixes eclipse-openj9#9997 Signed-off-by: Hang Shao <[email protected]>
https://ci.adoptopenjdk.net/view/Failing%20Builds/job/build-scripts/job/jobs/job/jdk/job/jdk-aix-ppc64-openj9/148/console
The text was updated successfully, but these errors were encountered: