Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON StackOverflowError ResourceBundle$RBClassLoader.loadClass(ResourceBundle #19777

Closed
pshipton opened this issue Jun 28, 2024 · 20 comments · Fixed by #19822 or #19824

Comments

@pshipton
Copy link
Member

pshipton commented Jun 28, 2024

Across platforms https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-JDK8/862/

https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_extended.functional_aarch64_linux_Nightly_testList_0/705 - ub22-aarch64-1
TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80_0
-Xgcpolicy:optthruput -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:count=0 -Xnocompressedrefs

22:09:16  [RefreshGCCache_NoBCI_Test] [INFO] Starting RefreshGCCache_NoBCI_Test...
22:09:17  Exception in thread "main" java.lang.StackOverflowError
22:09:17  	at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17  	at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17  	at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17  	at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17  	at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17  	at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
...
22:09:19  	at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:19  	at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:19  	at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:19  	at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2657)
22:09:19  	at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1518)
22:09:19  	at java.util.ResourceBundle.findBundle(ResourceBundle.java:1482)
22:09:19  	at java.util.ResourceBundle.findBundle(ResourceBundle.java:1436)
22:09:19  	at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1370)
22:09:19  	at java.util.ResourceBundle.getBundle(ResourceBundle.java:899)
22:09:19  	at sun.util.resources.LocaleData$1.run(LocaleData.java:167)
22:09:19  	at sun.util.resources.LocaleData$1.run(LocaleData.java:163)
22:09:19  	at java.security.AccessController.doPrivileged(AccessController.java:690)
22:09:19  	at sun.util.resources.LocaleData.getBundle(LocaleData.java:163)
22:09:19  	at sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:103)
22:09:19  	at sun.util.locale.provider.LocaleResources.getTimeZoneNames(LocaleResources.java:262)
22:09:19  	at sun.util.locale.provider.TimeZoneNameProviderImpl.getDisplayNameArray(TimeZoneNameProviderImpl.java:124)
22:09:19  	at sun.util.locale.provider.TimeZoneNameProviderImpl.getDisplayName(TimeZoneNameProviderImpl.java:99)
22:09:19  	at sun.util.locale.provider.TimeZoneNameUtility$TimeZoneNameGetter.getName(TimeZoneNameUtility.java:240)
22:09:19  	at sun.util.locale.provider.TimeZoneNameUtility$TimeZoneNameGetter.getObject(TimeZoneNameUtility.java:198)
22:09:19  	at sun.util.locale.provider.TimeZoneNameUtility$TimeZoneNameGetter.getObject(TimeZoneNameUtility.java:184)
22:09:19  	at sun.util.locale.provider.LocaleServiceProviderPool.getLocalizedObjectImpl(LocaleServiceProviderPool.java:281)
22:09:19  	at sun.util.locale.provider.LocaleServiceProviderPool.getLocalizedObject(LocaleServiceProviderPool.java:265)
22:09:19  	at sun.util.locale.provider.TimeZoneNameUtility.retrieveDisplayNamesImpl(TimeZoneNameUtility.java:166)
22:09:19  	at sun.util.locale.provider.TimeZoneNameUtility.retrieveDisplayName(TimeZoneNameUtility.java:137)
22:09:19  	at java.util.TimeZone.getDisplayName(TimeZone.java:400)
22:09:19  	at java.util.Date.toString(Date.java:1045)
22:09:19  	at org.testng.reporters.TestHTMLReporter.generateLog(TestHTMLReporter.java:284)
22:09:19  	at org.testng.reporters.TestHTMLReporter.onFinish(TestHTMLReporter.java:40)
22:09:19  	at org.testng.TestRunner.fireEvent(TestRunner.java:772)
22:09:19  	at org.testng.TestRunner.afterRun(TestRunner.java:741)
22:09:19  	at org.testng.TestRunner.run(TestRunner.java:509)
22:09:19  	at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
22:09:19  	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
22:09:19  	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
22:09:19  	at org.testng.SuiteRunner.run(SuiteRunner.java:364)
22:09:19  	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
22:09:19  	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
22:09:19  	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
22:09:19  	at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
22:09:19  	at org.testng.TestNG.runSuites(TestNG.java:1049)
22:09:19  	at org.testng.TestNG.run(TestNG.java:1017)
22:09:19  	at org.testng.TestNG.privateMain(TestNG.java:1354)
22:09:19  	at org.testng.TestNG.main(TestNG.java:1323)

Changes
2a2df9f...22eaa4d

@pshipton
Copy link
Member Author

pshipton commented Jun 28, 2024

Creating some builds for triage.

https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/534
https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3690/ - cent8-aarch64-2
passed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80
Implies #19699 is the cause @jdmpapin

https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/535
https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3691/ - cent8-aarch64-3
passed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80

https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/536
https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3692 - ub22-aarch64-1
failed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80

12:19:57  OpenJDK Runtime Environment (build 1.8.0_422-internal-_2024_06_28_14_55-b00)
12:19:57  Eclipse OpenJ9 VM (build triage28c-2a2df9f1117, JRE 1.8.0 Linux aarch64-64-Bit Compressed References 20240628_469 (JIT enabled, AOT enabled)
12:19:57  OpenJ9   - 2a2df9f1117
12:19:57  OMR      - 47a9d248db0
12:19:57  JCL      - 667e0c43d2c based on jdk8u422-b04)

@pshipton
Copy link
Member Author

@jdmpapin Actually the test failed in the grinder on an older build, which seems weird, but rules out your change.

@pshipton
Copy link
Member Author

@tobi this also must be related to -XX:+EnableExtendedHCR changes.

@pshipton
Copy link
Member Author

@jdmpapin I'm not sure we should rule out your change. The grinders are all running with the latest test code that removes -XX:+EnableExtendedHCR from the failing test. That it fails when Tobi's change is backed out makes sense, then it's passing until your change is introduced.

@pshipton
Copy link
Member Author

pshipton commented Jun 28, 2024

Building the latest 0.46 (which contains Tobi's change), to confirm if there is a problem with 0.46 or not.
https://openj9-jenkins.osuosl.org/job/Pipeline-Release-0.46.0-Build8/5/

@jdmpapin
Copy link
Contributor

OK, I'm checking it out

@pshipton
Copy link
Member Author

pshipton commented Jun 28, 2024

https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3694
passed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80 on 0.46

@jdmpapin
Copy link
Contributor

jdmpapin commented Jun 28, 2024

I'm able to reproduce this on z, but it's extremely slippery. It stops happening if I pass a completely irrelevant JIT option, e.g. -Xjit:disableSuffixLogs

edit: I was just replacing the count=0 option 🤦

@jdmpapin
Copy link
Contributor

jdmpapin commented Jul 4, 2024

OK, so I was able to reproduce the issue reliably on z with compilation limited to the following two methods:

  • java/lang/ClassLoader.loadClass
  • java/util/ResourceBundle$RBClassLoader.loadClass

RBClassLoader has a ClassLoader field that it delegates to. RBClassLoader.loadClass in particular delegates to ClassLoader.loadClass using that field. When compiling that call, the code generator did guarded devirtualization using a single abstract implementer, with RBClassLoader as the only concrete implementation even though ClassLoader has other concrete subclasses. It did that because the CH table reported that RBClassLoader was the only subclass of ClassLoader. ClassLoader had been redefined after initializing its other subclasses but before loading RBClassLoader. Usually on redefinition jitClassesRedefined() would fix the CH table entries in a way that prevents us from losing track of the subclasses. However, it had the wrong oldClass because it wasn't expecting classes to be redefined in-place in debug mode, which is possible since #19554, but which couldn't occur in this particular test until #19710 removed -XX:+EnableExtendedHCR from its command-line options

I have a fix that prevents the bug in my setup. It's just undergoing some more widespread testing

@jdmpapin
Copy link
Contributor

jdmpapin commented Jul 4, 2024

jitClassesRedefined() would fix the CH table entries in a way that prevents us from losing track of the subclasses

Clarification/correction: It tries to fix the CH table entries so that we don't lose track of subclasses (or any other state) when a class is not redefined in-place. This works by essentially swapping the class info of the original class and the class we'll use going forward. Normally when a class is redefined in-place, the swap does nothing because both are the same class. But because of the unexpected in-place redefinition, we had the original and the newly created class backwards, and so we swapped the info of the class to use going forward (which is the original) with that of the newly created class instead, and that's how the CH table ended up with no subclasses (and later only one subclass) for the original class

@pshipton
Copy link
Member Author

pshipton commented Jul 5, 2024

There is also a similar failure in TestRefreshGCSpecialClassesCache_BCI_FAST_HCR_JIT_ON_SE80_0
https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk8_j9_extended.functional_aarch64_linux_testList_1/640/

pshipton added a commit to pshipton/openj9 that referenced this issue Jul 5, 2024
pshipton added a commit to pshipton/openj9 that referenced this issue Jul 5, 2024
@pshipton
Copy link
Member Author

pshipton commented Jul 5, 2024

Created #19820 for 0.46 to add -XX:+EnableExtendedHCR

@pshipton
Copy link
Member Author

pshipton commented Jul 5, 2024

I'm also opening a head stream PR to add -XX:+EnableExtendedHCR to the tests, to avoid all the failures in the builds. We can revert when the JIT fix is ready.

#19822

@pshipton
Copy link
Member Author

pshipton commented Jul 5, 2024

I've removed the blocker label and moved this to 0.48 but since -XX:-EnableExtendedHCR is the default in 0.46 and later, users may get a crash rather than the expected message about enabling -XX:+EnableExtendedHCR.

@jdmpapin
Copy link
Contributor

jdmpapin commented Jul 5, 2024

I think the message mentioning -XX:+EnableExtendedHCR is only supposed to show when it would allow a redefinition that is currently failing, e.g. because of an added method. But this bug can only occur when the redefinition has succeeded, so the message isn't actually expected in that case (even though the option would be a workaround for the bug)

I suppose it's too late now to get the JIT fix into 0.46?

@pshipton
Copy link
Member Author

pshipton commented Jul 5, 2024

We've started M2, any further fix needs to be high severity or low risk and we'll have to convince the PM it needs to be included.

@JamesKingdon
Copy link
Contributor

Do we have a record of the likely crash stack for this problem to help identify future reports? All the jenkins links are dead now.

@pshipton
Copy link
Member Author

There is a stack in the description.

@JamesKingdon
Copy link
Contributor

I found some example stacks in dups 19768 and 19765. There's some variation, but generally seem to go through TR_PersistentCHTable::findSingleImplementer during inlining:

[0xf]   j9jit29!TR_J9VMBase::getResolvedInterfaceMethod+0x98   0x222ca795d0   0x7ffb102e0296
[0x10]   j9jit29!TR_ResolvedJ9Method::getResolvedInterfaceMethod+0x46   0x222ca79620   0x7ffb102d083b
[0x11]   j9jit29!CollectImplementors::visitSubclass+0x8b   0x222ca79680   0x7ffb102cfa84
[0x12]   j9jit29!TR_SubclassVisitor::visitSubclasses+0xd4   0x222ca796c0   0x7ffb102cf936
[0x13]   j9jit29!TR_SubclassVisitor::visit+0x136   0x222ca79710   0x7ffb102cfccb
[0x14]   j9jit29!TR_ClassQueries::collectImplementorsCapped+0x12b   0x222ca79770   0x7ffb102f1901
[0x15]   j9jit29!TR_PersistentCHTable::findSingleImplementer+0xb1   0x222ca79800   0x7ffb103fc982
[0x16]   j9jit29!TR_J9InterfaceCallSite::findCallSiteTargetImpl+0x1b2   0x222ca79870   0x7ffb103fc589
[0x17]   j9jit29!TR_J9InterfaceCallSite::findCallSiteTarget+0x59   0x222ca79930   0x7ffb10607c92
[0x18]   j9jit29!TR_InlinerBase::getSymbolAndFindInlineTargets+0x372   0x222ca799b0   0x7ffb103d4161

00:23:40   [ERR] jitGetInterfaceMethodFromCP+0xc3 (0x0000000012322933 [libj9jit29.dylib+0x1322933])
00:23:40   [ERR] _ZN11TR_J9VMBase26getResolvedInterfaceMethodEP14J9ConstantPoolP19TR_OpaqueClassBlocki+0x8c (0x000000001109B0CC [libj9jit29.dylib+0x9b0cc])
00:23:40   [ERR] _ZN19TR_ResolvedJ9Method26getResolvedInterfaceMethodEPN2TR11CompilationEP19TR_OpaqueClassBlocki+0x3a (0x0000000011098D8A [libj9jit29.dylib+0x98d8a])
00:23:40   [ERR] _ZN19CollectImplementors13visitSubclassEP22TR_PersistentClassInfo+0x8b (0x0000000011085DCB [libj9jit29.dylib+0x85dcb])
00:23:40   [ERR] _ZN18TR_SubclassVisitor15visitSubclassesEP22TR_PersistentClassInfoRN10TR_CHTable12VisitTrackerE+0xeb (0x0000000011086B1B [libj9jit29.dylib+0x86b1b])
00:23:40   [ERR] _ZN18TR_SubclassVisitor5visitEP19TR_OpaqueClassBlockb+0x113 (0x0000000011086333 [libj9jit29.dylib+0x86333])
00:23:40   [ERR] _ZN15TR_ClassQueries25collectImplementorsCappedEP22TR_PersistentClassInfoPP17TR_ResolvedMethodiiS3_PN2TR11CompilationEb13TR_YesNoMaybe+0x106 (0x00000000110861D6 [libj9jit29.dylib+0x861d6])
00:23:40   [ERR] _ZN20TR_PersistentCHTable21findSingleImplementerEP19TR_OpaqueClassBlockiP17TR_ResolvedMethodPN2TR11CompilationEb13TR_YesNoMaybeb+0xa0 (0x00000000110A8800 [libj9jit29.dylib+0xa8800])
00:23:40   [ERR] _ZN22TR_J9InterfaceCallSite22findCallSiteTargetImplEP12TR_CallStackP14TR_InlinerBaseP19TR_OpaqueClassBlock+0x168 (0x00000000111F1B58 [libj9jit29.dylib+0x1f1b58])
00:23:40   [ERR] _ZN22TR_J9InterfaceCallSite18findCallSiteTargetEP12TR_CallStackP14TR_InlinerBase+0x3a (0x00000000111F17BA [libj9jit29.dylib+0x1f17ba])
00:23:40   [ERR] _ZN14TR_InlinerBase29getSymbolAndFindInlineTargetsEP12TR_CallStackP11TR_CallSiteb+0x2cc (0x000000001144552C [libj9jit29.dylib+0x44552c])
00:23:40   [ERR] _ZN14TR_DumbInliner15analyzeCallSiteEP12TR_CallStackPN2TR7TreeTopEPNS2_4NodeES6_+0xb4 (0x0000000011446234 [libj9jit29.dylib+0x446234])

I also saw a different stack in a customer case, during codegen:

jitGetInterfaceMethodFromCP+0x17 (0x00007FADE9F4D107 [libj9jit29.so+0xa10107])
_ZN11TR_J9VMBase26getResolvedInterfaceMethodEP14J9ConstantPoolP19TR_OpaqueClassBlocki+0x8b (0x00007FADE9761E3B [libj9jit29.so+0x224e3b])
_ZN19TR_ResolvedJ9Method26getResolvedInterfaceMethodEPN2TR11CompilationEP19TR_OpaqueClassBlocki+0x4c (0x00007FADE9758D9C [libj9jit29.so+0x21bd9c])
_ZN19CollectImplementors13visitSubclassEP22TR_PersistentClassInfo+0xd9 (0x00007FADE97456E9 [libj9jit29.so+0x2086e9])
_ZN18TR_SubclassVisitor15visitSubclassesEP22TR_PersistentClassInfoRN10TR_CHTable12VisitTrackerE.localalias+0x5e (0x00007FADE9746B7E [libj9jit29.so+0x209b7e])
_ZN18TR_SubclassVisitor5visitEP19TR_OpaqueClassBlockb+0x118 (0x00007FADE9746D58 [libj9jit29.so+0x209d58])
_ZN15TR_ClassQueries25collectImplementorsCappedEP22TR_PersistentClassInfoPP17TR_ResolvedMethodiiS3_PN2TR11CompilationEb13TR_YesNoMaybe+0x146 (0x00007FADE9746F46 [libj9jit29.so+0x209f46])
_ZN2TR11X86CallSite22computeProfiledTargetsEv+0x532 (0x00007FADE9A80562 [libj9jit29.so+0x543562])
_ZN2TR11X86CallSiteC1EPNS_4NodeEPNS_7LinkageE+0x8f (0x00007FADE9A80E7F [libj9jit29.so+0x543e7f])
_ZN2J93X8614PrivateLinkage21buildIndirectDispatchEPN2TR4NodeE+0x78 (0x00007FADE9A81898 [libj9jit29.so+0x544898])
_ZN3OMR3X8613TreeEvaluator11performCallEPN2TR4NodeEbbPNS2_13CodeGeneratorE+0x3c (0x00007FADE9ED01CC [libj9jit29.so+0x9931cc])
_ZN3OMR13CodeGenerator8evaluateEPN2TR4NodeE+0x44 (0x00007FADE9AA71B4 [libj9jit29.so+0x56a1b4])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment