Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK20 jdk/incubator/concurrent/ScopedValue/StressStackOverflow.java StressStackOverflow$TestFailureException: Unexpected value for ScopedValue #16965

Open
JasonFengJ9 opened this issue Mar 20, 2023 · 58 comments
Assignees
Labels
Milestone

Comments

@JasonFengJ9
Copy link
Member

Failure link

From an internal build(ub18-aarch64-8):

08:57:42  openjdk version "20-internal" 2023-03-21
08:57:42  OpenJDK Runtime Environment (build 20-internal-adhoc.jenkins.BuildJDK20aarch64linuxPersonal)
08:57:42  Eclipse OpenJ9 VM (build master-8cc902bc75f, JRE 20 Linux aarch64-64-Bit Compressed References 20230316_13 (JIT enabled, AOT enabled)
08:57:42  OpenJ9   - 8cc902bc75f
08:57:42  OMR      - 035ec68ec0c
08:57:42  JCL      - 10a35739269 based on jdk-20+36)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

09:13:44  variation: Mode150
09:13:44  JVM_OPTIONS:  -XX:+UseCompressedOops 

09:18:25  TEST: jdk/incubator/concurrent/ScopedValue/StressStackOverflow.java

09:18:25  STDERR:
09:18:25  WARNING: Using incubator modules: jdk.incubator.concurrent
09:18:25  java.lang.RuntimeException: StressStackOverflow$TestFailureException: Unexpected value for ScopedValue
09:18:25  	at StressStackOverflow.lambda$run$1(StressStackOverflow.java:196)
09:18:25  	at java.base/jdk.internal.vm.ScopedValueContainer.runWithoutScope(ScopedValueContainer.java:114)
09:18:25  	at java.base/jdk.internal.vm.ScopedValueContainer.run(ScopedValueContainer.java:100)
09:18:25  	at jdk.incubator.concurrent/jdk.incubator.concurrent.ScopedValue$Carrier.run(ScopedValue.java:417)
09:18:25  	at StressStackOverflow.run(StressStackOverflow.java:158)
09:18:25  	at StressStackOverflow.main(StressStackOverflow.java:209)
09:18:25  	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
09:18:25  	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
09:18:25  	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125)
09:18:25  	at java.base/java.lang.Thread.run(Thread.java:1639)
09:18:25  	Suppressed: jdk.incubator.concurrent.StructureViolationException
09:18:25  		at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:67)
09:18:25  		at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
09:18:25  		at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:484)
09:18:25  		at java.base/jdk.internal.misc.StructureViolationExceptions.newException(StructureViolationExceptions.java:45)
09:18:25  		at java.base/jdk.internal.misc.StructureViolationExceptions.throwException(StructureViolationExceptions.java:72)
09:18:25  		at java.base/jdk.internal.misc.ThreadFlock$ThreadContainerImpl.close(ThreadFlock.java:541)
09:18:25  		at java.base/jdk.internal.misc.ThreadFlock.close(ThreadFlock.java:432)
09:18:25  		at jdk.incubator.concurrent/jdk.incubator.concurrent.StructuredTaskScope.close(StructuredTaskScope.java:702)
09:18:25  		at StressStackOverflow.lambda$run$1(StressStackOverflow.java:159)
09:18:25  		... 9 more
09:18:25  	Suppressed: jdk.incubator.concurrent.StructureViolationException
09:18:25  		at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:67)
09:18:25  		at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
09:18:25  		at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:484)
09:18:25  		at java.base/jdk.internal.misc.StructureViolationExceptions.newException(StructureViolationExceptions.java:45)
09:18:25  		at java.base/jdk.internal.misc.StructureViolationExceptions.newException(StructureViolationExceptions.java:58)
09:18:25  		at java.base/jdk.internal.vm.ScopedValueContainer.throwIfFailed(ScopedValueContainer.java:205)
09:18:25  		at java.base/jdk.internal.vm.ScopedValueContainer.runWithoutScope(ScopedValueContainer.java:122)
09:18:25  		... 8 more
09:18:25  Caused by: StressStackOverflow$TestFailureException: Unexpected value for ScopedValue
09:18:25  	at StressStackOverflow.<clinit>(StressStackOverflow.java:47)
09:18:25  	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method)
09:18:25  	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1522)
09:18:25  	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.ensureClassInitialized(MethodHandleAccessorFactory.java:300)
09:18:25  	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newMethodAccessor(MethodHandleAccessorFactory.java:71)
09:18:25  	at java.base/jdk.internal.reflect.ReflectionFactory.newMethodAccessor(ReflectionFactory.java:159)
09:18:25  	at java.base/java.lang.reflect.Method.acquireMethodAccessor(Method.java:724)
09:18:25  	at java.base/java.lang.reflect.Method.invoke(Method.java:575)
09:18:25  	... 2 more
09:18:25  	Suppressed: jdk.incubator.concurrent.StructureViolationException
09:18:25  		at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:67)
09:18:25  		at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
09:18:25  		at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:484)
09:18:25  		at java.base/jdk.internal.misc.StructureViolationExceptions.newException(StructureViolationExceptions.java:45)
09:18:25  		at java.base/jdk.internal.misc.StructureViolationExceptions.newException(StructureViolationExceptions.java:58)
09:18:25  		at java.base/jdk.internal.vm.ScopedValueContainer.throwIfFailed(ScopedValueContainer.java:205)
09:18:25  		at java.base/jdk.internal.vm.ScopedValueContainer.doCall(ScopedValueContainer.java:194)
09:18:25  		at java.base/jdk.internal.vm.ScopedValueContainer.call(ScopedValueContainer.java:151)
09:18:25  		at jdk.incubator.concurrent/jdk.incubator.concurrent.ScopedValue$Carrier.call(ScopedValue.java:370)
09:18:25  		at StressStackOverflow$DeepRecursion.run(StressStackOverflow.java:72)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:117)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:120)
09:18:25  		at StressStackOverflow.fibonacci_pad(StressStackOverflow.java:128)
09:18:25  		at StressStackOverflow$DeepRecursion.lambda$run$0(StressStackOverflow.java:72)
09:18:25  		at java.base/jdk.internal.vm.ScopedValueContainer.doCall(ScopedValueContainer.java:186)
09:18:25  		at java.base/jdk.internal.vm.ScopedValueContainer.call(ScopedValueContainer.java:151)
09:18:25  		at jdk.incubator.concurrent/jdk.incubator.concurrent.ScopedValue$Carrier.call(ScopedValue.java:370)

09:18:25  JavaTest Message: Test threw exception: java.lang.RuntimeException: StressStackOverflow$TestFailureException: Unexpected value for ScopedValue
09:18:25  JavaTest Message: shutting down test
09:18:25  
09:18:25  STATUS:Failed.`main' threw exception: java.lang.RuntimeException: StressStackOverflow$TestFailureException: Unexpected value for ScopedValue

09:18:25  TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: StressStackOverflow$TestFailureException: Unexpected value for ScopedValue
09:18:25  --------------------------------------------------
09:18:25  Test results: passed: 275; failed: 1
09:18:30  Report written to /home/jenkins/workspace/Test_openjdk20_j9_extended.openjdk_aarch64_linux_Personal/aqa-tests/TKG/output_16789716238872/jdk_other_0/report/html/report.html
09:18:30  Results written to /home/jenkins/workspace/Test_openjdk20_j9_extended.openjdk_aarch64_linux_Personal/aqa-tests/TKG/output_16789716238872/jdk_other_0/work
09:18:30  Error: Some tests failed or other problems occurred.
09:18:30  -----------------------------------
09:18:30  jdk_other_0_FAILED

Also appeared in other platforms such as x86-64_linux s390x_linux

@babsingh
Copy link
Contributor

babsingh commented Apr 12, 2023

@thallium
Copy link
Contributor

I modified scopedValueBindings to return a snapshot where inheritedValue is mapped 42. Under 100,000 iterations, the test passed and the unmodified version failed. But under the original 1,000,000 iterations this still failed. Not sure why would this happen but I think this indicates that the failure is caused by the unimplemented Thread.findScopedValueBindings as @babsingh mentioned.

@babsingh
Copy link
Contributor

babsingh commented Apr 24, 2023

@thallium Open a draft PR with your implementation so it can be reviewed. Compare your implementation with the pseudo-code in #16677 (comment). Can you provide more details for the failure at 1,000,000 iterations; is it the same failure seen without Thread.findScopedValueBindings?

@thallium
Copy link
Contributor

Open a draft PR with your implementation so it can be reviewed

It's not a fix, just a hack to check if it's related to Thread.findScopedValueBindings .

is it the same failure seen without Thread.findScopedValueBindings?

Yes. @babsingh

@babsingh
Copy link
Contributor

@thallium Do you have free cycles to implement the pseudo-code in #16677 (comment)? If yes, refer to the below code to implement the pseudo-code.

J9StackWalkState walkState;
walkState.walkThread = walkThread;
walkState.flags = J9_STACKWALK_ITERATE_FRAMES | J9_STACKWALK_ITERATE_O_SLOTS;
walkState.skipCount = 0;
walkState.userData1 = (void *)scope;
walkState.userData2 = (void *)&scopeNotFound;
walkState.frameWalkFunction = closeScope0FrameWalkFunction;
walkState.objectSlotWalkFunction = closeScope0OSlotWalkFunction;
vm->walkStackFrames(walkThread, &walkState);
if (JNI_FALSE == *(jboolean *)walkState.userData2) {
/* scope found */
break;
}

@thallium
Copy link
Contributor

@babsingh Yes I'm working on this

@thallium
Copy link
Contributor

thallium commented Apr 27, 2023

Ran a 20x grinder with -Xjit:disableInlining, the test is failing intermittently:

WARNING: Using incubator modules: jdk.incubator.concurrent
java.lang.RuntimeException: StressStackOverflow$TestFailureException: Unexpected value for ScopedValue
	at StressStackOverflow.lambda$run$1(StressStackOverflow.java:196)
	at java.base/jdk.internal.vm.ScopedValueContainer.runWithoutScope(ScopedValueContainer.java:114)
	at java.base/jdk.internal.vm.ScopedValueContainer.run(ScopedValueContainer.java:100)
	at jdk.incubator.concurrent/jdk.incubator.concurrent.ScopedValue$Carrier.run(ScopedValue.java:417)
	at StressStackOverflow.run(StressStackOverflow.java:158)
	at StressStackOverflow.main(StressStackOverflow.java:209)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:125)
	at java.base/java.lang.Thread.run(Thread.java:1639)
Caused by: StressStackOverflow$TestFailureException: Unexpected value for ScopedValue
	at StressStackOverflow.<clinit>(StressStackOverflow.java:47)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1522)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.ensureClassInitialized(MethodHandleAccessorFactory.java:300)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newMethodAccessor(MethodHandleAccessorFactory.java:71)
	at java.base/jdk.internal.reflect.ReflectionFactory.newMethodAccessor(ReflectionFactory.java:159)
	at java.base/java.lang.reflect.Method.acquireMethodAccessor(Method.java:724)
	at java.base/java.lang.reflect.Method.invoke(Method.java:575)
	... 2 more

@thallium
Copy link
Contributor

thallium commented May 3, 2023

Ran the test 20x locally with -Xint (need to adjust the time limit) and all passed.

@babsingh
Copy link
Contributor

babsingh commented May 3, 2023

re #16965 (comment): Thanks, @thallium.

  • @thallium clarified that the test passes with -Xint and without the changes for Thread.findScopedValueBindings [JDK20] Thread.findScopedValueBindings to support JEP 429? #16677.
  • I have added the JIT tag to the issue.
  • @thallium The next step will be to generate a core file during the exception and determine from the !stackslots the source of the Unexpected value for ScopedValue. This will give the JIT a better starting point to investigate.

@thallium
Copy link
Contributor

thallium commented May 4, 2023

the source of the Unexpected value for ScopedValue

@babsingh Not too much I can find. The test tests whether the scoped value can be recovered correctly after a stack overflow. In each recursive call the scoped value el is increased by one and the equality of the value before and after the recursion. The test fails because the scoped value after the recursion is greater than the value before the recursion, but the Snapshot is correctly chained which means the scoped value wasn't been recovered.

@babsingh
Copy link
Contributor

babsingh commented May 12, 2023

JVM option to create core files during StressStackOverflow$TestFailureException: -Xdump:system:events=catch,filter=*TestFailureException*

Java stack when the exception was thrown; the top frames are JIT'ed + inlined:

Scenario 1:

<202500>        !j9method 0x0000000000254EB0   jdk/incubator/concurrent/ScopedValue$Snapshot.find(Ljdk/incubator/concurrent/ScopedValue;)Ljava/lang/Object;
<202500>        !j9method 0x000000000024AFD8   jdk/incubator/concurrent/ScopedValue.findBinding()Ljava/lang/Object;
<202500>        !j9method 0x000000000024AF98   jdk/incubator/concurrent/ScopedValue.slowGet()Ljava/lang/Object;
<202500>        !j9method 0x000000000024AF78   jdk/incubator/concurrent/ScopedValue.get()Ljava/lang/Object;
<202500>        !j9method 0x000000000023F4E8   StressStackOverflow$DeepRecursion.run()V
<202500>        !j9method 0x000000000023E100   StressStackOverflow.fibonacci_pad1(ILjava/lang/Runnable;)J

Scenario 2:

<202500>        !j9method 0x0000000000258238   jdk/incubator/concurrent/ScopedValue$Carrier.runWith(Ljdk/incubator/concurrent/ScopedValue$Snapshot;Ljava/lang/Runnable;)V
<202500>        !j9method 0x0000000000258218   jdk/incubator/concurrent/ScopedValue$Carrier.run(Ljava/lang/Runnable;)V
<202500>        !j9method 0x000000000023F4E8   StressStackOverflow$DeepRecursion.run()V
<202500>        !j9method 0x000000000023E100   StressStackOverflow.fibonacci_pad1(ILjava/lang/Runnable;)J

@thallium To investigate Unexpected value for ScopedValue:

Based on the code in ScopedValue.scopedValueBindings, this issue might still be related to JLA.findScopedValueBindings (#16677).

@thallium
Copy link
Contributor

Got seg fault when trying to print the value of the scoped value:

Type=Segmentation error vmState=0x000501ff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FADF1BA0940 Handler2=00007FADF1A07AF0 InaccessibleAddress=0000000000000008
RDI=00007FACC800E240 RSI=1FFFFFFFFFFFFFFF RAX=0000000000000000 RBX=0000000000000008
RCX=00007FADEC0178D0 RDX=0000000000000001 R8=0000000000000000 R9=0000000000000000
R10=00000000000501FF R11=0000000000000000 R12=00007FADEBEF0060 R13=00007FADEC4310B0
R14=00007FADF152ACE0 R15=00007FACC800E240
RIP=00007FADF0AB361E GS=0000 FS=0000 RSP=00007FACCDD468A0
EFlags=0000000000010246 CS=0033 RBP=00007FACC8003C00 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000008
xmm0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm1 0000002e000000ba (f: 186.000000, d: 9.761181e-313)
xmm2 000000000000002e (f: 46.000000, d: 2.272702e-322)
xmm3 00000000000000be (f: 190.000000, d: 9.387247e-322)
xmm4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm5 00007facadd96200 (f: 2916704768.000000, d: 6.935675e-310)
xmm6 00007fadec432e48 (f: 3963825664.000000, d: 6.935939e-310)
xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8 6f6e67695f525400 (f: 1599230976.000000, d: 5.762047e+228)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 757575767575757e (f: 1970632064.000000, d: 6.444104e+257)
xmm11 0000015200000151 (f: 337.000000, d: 7.172346e-312)
xmm12 0000013d00000140 (f: 320.000000, d: 6.726727e-312)
xmm13 000001380000013f (f: 319.000000, d: 6.620627e-312)
xmm14 0000000008001800 (f: 134223872.000000, d: 6.631540e-316)
xmm15 000001420000013b (f: 315.000000, d: 6.832826e-312)
Module=/home/thallium/openj9-openjdk-jdk20/build/linux-x86_64-server-release/images/jdk/lib/default/libj9jit29.so
Module_base_address=00007FADF086E000

Method_being_compiled=jdk/incubator/concurrent/ScopedValue$Carrier.call(Ljava/util/concurrent/Callable;)Ljava/lang/Object;
Target=2_90_20230329_000000 (Linux 5.15.0-71-generic)
CPU=amd64 (8 logical CPUs) (0x3e808c000 RAM)
----------- Stack Backtrace -----------
_ZN11TR_J9VMBase19getReferenceElementEml+0xde (0x00007FADF0AB361E [libj9jit29.so+0x24561e])
_ZN11TR_J9VMBase45getKnotIndexOfInvokeCacheArrayAppendixElementEPN2TR11CompilationEPm+0x81 (0x00007FADF0AB1961 [libj9jit29.so+0x243961])
_ZN19InterpreterEmulator48updateKnotAndCreateCallSiteUsingInvokeCacheArrayEP19TR_ResolvedJ9MethodPmi+0x3c (0x00007FADF0C1ED9C [libj9jit29.so+0x3b0d9c])
_ZN19InterpreterEmulator35findAndCreateCallsitesFromBytecodesEbb+0x1b0 (0x00007FADF0C23330 [libj9jit29.so+0x3b5330])
_ZN21TR_J9EstimateCodeSize20realEstimateCodeSizeEP13TR_CallTargetP12TR_CallStackbRN2TR6RegionE+0xb2c (0x00007FADF0C1789C [libj9jit29.so+0x3a989c])
_ZN21TR_J9EstimateCodeSize20realEstimateCodeSizeEP13TR_CallTargetP12TR_CallStackbRN2TR6RegionE+0x1ad7 (0x00007FADF0C18847 [libj9jit29.so+0x3aa847])
_ZN21TR_J9EstimateCodeSize20realEstimateCodeSizeEP13TR_CallTargetP12TR_CallStackbRN2TR6RegionE+0x1ad7 (0x00007FADF0C18847 [libj9jit29.so+0x3aa847])
_ZN21TR_J9EstimateCodeSize20realEstimateCodeSizeEP13TR_CallTargetP12TR_CallStackbRN2TR6RegionE+0x1ad7 (0x00007FADF0C18847 [libj9jit29.so+0x3aa847])
_ZN21TR_J9EstimateCodeSize20realEstimateCodeSizeEP13TR_CallTargetP12TR_CallStackbRN2TR6RegionE+0x1ad7 (0x00007FADF0C18847 [libj9jit29.so+0x3aa847])
_ZN21TR_J9EstimateCodeSize20realEstimateCodeSizeEP13TR_CallTargetP12TR_CallStackbRN2TR6RegionE+0x1ad7 (0x00007FADF0C18847 [libj9jit29.so+0x3aa847])
_ZN21TR_J9EstimateCodeSize16estimateCodeSizeEP13TR_CallTargetP12TR_CallStackb.localalias+0x45 (0x00007FADF0C19E55 [libj9jit29.so+0x3abe55])
_ZN19TR_EstimateCodeSize17calculateCodeSizeEP13TR_CallTargetP12TR_CallStackb+0xc3 (0x00007FADF0B87533 [libj9jit29.so+0x319533])
_ZN28TR_MultipleCallTargetInliner13weighCallSiteEP12TR_CallStackP11TR_CallSitebb+0x1fb (0x00007FADF0C0487B [libj9jit29.so+0x39687b])
_ZN28TR_MultipleCallTargetInliner17inlineCallTargetsEPN2TR20ResolvedMethodSymbolEP12TR_CallStackP24TR_InnerPreexistenceInfo+0x16d7 (0x00007FADF0C0D007 [libj9jit29.so+0x39f007])
_ZN14TR_InlinerBase15performInliningEPN2TR20ResolvedMethodSymbolE+0xbb (0x00007FADF0EF013B [libj9jit29.so+0x68213b])
_ZN10TR_Inliner7performEv+0x14a (0x00007FADF0C03DCA [libj9jit29.so+0x395dca])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0x865 (0x00007FADF1008E45 [libj9jit29.so+0x79ae45])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii.localalias+0xc2f (0x00007FADF100920F [libj9jit29.so+0x79b20f])
_ZN3OMR9Optimizer8optimizeEv+0x1c3 (0x00007FADF100AB93 [libj9jit29.so+0x79cb93])
_ZN3OMR11Compilation7compileEv+0xa85 (0x00007FADF0DF61F5 [libj9jit29.so+0x5881f5])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x4c4 (0x00007FADF09E2FA4 [libj9jit29.so+0x174fa4])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x366 (0x00007FADF09E3FA6 [libj9jit29.so+0x175fa6])
omrsig_protect+0x241 (0x00007FADF1A087A1 [libj9prt29.so+0x297a1])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x382 (0x00007FADF09E1AC2 [libj9jit29.so+0x173ac2])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x12c (0x00007FADF09E1E4C [libj9jit29.so+0x173e4c])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x35b (0x00007FADF09E0CFB [libj9jit29.so+0x172cfb])
_ZN2TR24CompilationInfoPerThread3runEv+0x42 (0x00007FADF09E1072 [libj9jit29.so+0x173072])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x86 (0x00007FADF09E1126 [libj9jit29.so+0x173126])
omrsig_protect+0x241 (0x00007FADF1A087A1 [libj9prt29.so+0x297a1])
_Z21compilationThreadProcPv+0x17f (0x00007FADF09E14FF [libj9jit29.so+0x1734ff])
thread_wrapper+0x187 (0x00007FADF1B4EC07 [libj9thr29.so+0xbc07])
 (0x00007FADF1F6BB43 [libc.so.6+0x94b43])
 (0x00007FADF1FFDA00 [libc.so.6+0x126a00])
---------------------------------------

Stack trace from gdb:

#0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=140380164421184) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=11, threadid=140380164421184) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140380164421184, signo=11) at ./nptl/pthread_kill.c:89
#3  0x00007fadf1a06d72 in omrdump_create (portLibrary=0x7fadf1e21400 <j9portLibrary>, filename=0x7faccdd44650 "/home/thallium/aqa-tests/TKG/output_16841642144435/jdk_custom_1/work/scratch/", dumpType=<optimized out>, userData=<optimized out>)
    at /home/thallium/openj9-openjdk-jdk20/omr/port/unix/omrosdump.c:188
#4  0x00007fadf158c003 in doSystemDump (agent=0x7fadec02c580, label=0x7faccdd44650 "/home/thallium/aqa-tests/TKG/output_16841642144435/jdk_custom_1/work/scratch/", context=<optimized out>) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/rasdump/dmpagent.c:759
#5  0x00007fadf1587c49 in protectedDumpFunction (portLibrary=portLibrary@entry=0x7fadf1e21400 <j9portLibrary>, userData=userData@entry=0x7faccdd445b0) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/rasdump/dmpagent.c:2893
#6  0x00007fadf1a087a1 in omrsig_protect (portLibrary=0x7fadf1e21400 <j9portLibrary>, fn=0x7fadf1587c30 <protectedDumpFunction>, fn_arg=0x7faccdd445b0, handler=0x7fadf1587c50 <signalHandler>, handler_arg=0x0, flags=505, result=0x7faccdd445a8)
    at /home/thallium/openj9-openjdk-jdk20/omr/port/unix/omrsignal.c:425
#7  0x00007fadf158b476 in runDumpFunction (agent=<optimized out>, label=0x7faccdd44650 "/home/thallium/aqa-tests/TKG/output_16841642144435/jdk_custom_1/work/scratch/", context=<optimized out>) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/rasdump/dmpagent.c:2871
#8  0x00007fadf158b601 in runDumpAgent (vm=vm@entry=0x7fadec0178d0, agent=agent@entry=0x7fadec02c580, context=context@entry=0x7faccdd44b20, state=state@entry=0x7faccdd44b10, detail=detail@entry=0x7faccdd44ca0 "", timeNow=timeNow@entry=1684164524279)
    at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/rasdump/dmpagent.c:2790
#9  0x00007fadf15a2c15 in triggerDumpAgents (vm=0x7fadec0178d0, self=0x7facc8003c00, eventFlags=8192, eventData=<optimized out>) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/rasdump/trigger.c:1037
#10 0x00007fadf1ba0532 in generateDiagnosticFiles (portLibrary=portLibrary@entry=0x7fadf1e21400 <j9portLibrary>, userData=userData@entry=0x7faccdd45180) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/vm/gphandle.c:1162
#11 0x00007fadf1a087a1 in omrsig_protect (portLibrary=0x7fadf1e21400 <j9portLibrary>, fn=0x7fadf1ba03d0 <generateDiagnosticFiles>, fn_arg=0x7faccdd45180, handler=0x7fadf1b9f9c0 <recursiveCrashHandler>, handler_arg=0x7faccdd45150, flags=505, result=0x7faccdd45148)
    at /home/thallium/openj9-openjdk-jdk20/omr/port/unix/omrsignal.c:425
#12 0x00007fadf1ba06ca in vmSignalHandler (portLibrary=0x7fadf1e21400 <j9portLibrary>, gpType=24, gpInfo=<optimized out>, userData=<optimized out>) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/vm/gphandle.c:837
#13 0x00007fadf1a07d28 in mainSynchSignalHandler (signal=11, sigInfo=0x7faccdd46430, contextInfo=0x7faccdd46300) at /home/thallium/openj9-openjdk-jdk20/omr/port/unix/omrsignal.c:1066
#14 <signal handler called>
#15 j9javaArrayOfObject_load (index=<optimized out>, array=0x7fadebef0060, vmThread=0x7facc8003c00) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/oti/j9accessbarrierhelpers.h:57
#16 TR_J9VMBase::getReferenceElement (this=<optimized out>, objectPointer=140384964378720, elementIndex=<optimized out>) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/env/VMJ9.cpp:1277
#17 0x00007fadf0ab1961 in TR_J9VMBase::getKnotIndexOfInvokeCacheArrayAppendixElement (this=0x7facc800e240, comp=<optimized out>, invokeCacheArray=0x7fadec4310b0) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/env/VMJ9.cpp:4856
#18 0x00007fadf0c1ed9c in InterpreterEmulator::updateKnotAndCreateCallSiteUsingInvokeCacheArray (this=0x7faccdd46b80, owningMethod=0x7facadd88730, invokeCacheArray=0x7fadec4310b0, cpIndex=-1)
    at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/InterpreterEmulator.cpp:1312
#19 0x00007fadf0c23330 in InterpreterEmulator::findAndCreateCallsitesFromBytecodes (this=this@entry=0x7faccdd46b80, wasPeekingSuccessfull=wasPeekingSuccessfull@entry=false, withState=withState@entry=false)
    at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/InterpreterEmulator.cpp:1180
#20 0x00007fadf0c1789c in TR_J9EstimateCodeSize::realEstimateCodeSize (this=<optimized out>, calltarget=0x7facadd889d0, prevCallStack=<optimized out>, recurseDown=<optimized out>, cfgRegion=...)
    at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:1335
#21 0x00007fadf0c18847 in TR_J9EstimateCodeSize::estimateCodeSize (recurseDown=true, prevCallStack=0x7faccdd476f0, calltarget=0x7facadd889d0, this=0x7facadcfe220) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:417
#22 TR_J9EstimateCodeSize::realEstimateCodeSize (this=<optimized out>, calltarget=0x7facadd87c20, prevCallStack=<optimized out>, recurseDown=<optimized out>, cfgRegion=...) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:1556
#23 0x00007fadf0c18847 in TR_J9EstimateCodeSize::estimateCodeSize (recurseDown=true, prevCallStack=0x7faccdd482e0, calltarget=0x7facadd87c20, this=0x7facadcfe220) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:417
#24 TR_J9EstimateCodeSize::realEstimateCodeSize (this=<optimized out>, calltarget=0x7facadd4dec0, prevCallStack=<optimized out>, recurseDown=<optimized out>, cfgRegion=...) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:1556
#25 0x00007fadf0c18847 in TR_J9EstimateCodeSize::estimateCodeSize (recurseDown=true, prevCallStack=0x7faccdd48ed0, calltarget=0x7facadd4dec0, this=0x7facadcfe220) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:417
#26 TR_J9EstimateCodeSize::realEstimateCodeSize (this=<optimized out>, calltarget=0x7facadd483c0, prevCallStack=<optimized out>, recurseDown=<optimized out>, cfgRegion=...) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:1556
#27 0x00007fadf0c18847 in TR_J9EstimateCodeSize::estimateCodeSize (recurseDown=true, prevCallStack=0x7faccdd49ac0, calltarget=0x7facadd483c0, this=0x7facadcfe220) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:417
#28 TR_J9EstimateCodeSize::realEstimateCodeSize (this=<optimized out>, calltarget=0x7facadd13640, prevCallStack=<optimized out>, recurseDown=<optimized out>, cfgRegion=...) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:1556
#29 0x00007fadf0c18847 in TR_J9EstimateCodeSize::estimateCodeSize (recurseDown=true, prevCallStack=0x7faccdd4a6b0, calltarget=0x7facadd13640, this=0x7facadcfe220) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:417
#30 TR_J9EstimateCodeSize::realEstimateCodeSize (this=<optimized out>, calltarget=0x7facadd11b60, prevCallStack=<optimized out>, recurseDown=<optimized out>, cfgRegion=...) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:1556
#31 0x00007fadf0c19e55 in TR_J9EstimateCodeSize::estimateCodeSize (this=0x7facadcfe220, calltarget=0x7facadd11b60, prevCallStack=0x7faccdd4c3a0, recurseDown=<optimized out>) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/J9EstimateCodeSize.cpp:417
#32 0x00007fadf0b87533 in TR_EstimateCodeSize::calculateCodeSize (this=this@entry=0x7facadcfe220, calltarget=calltarget@entry=0x7facadd11b60, callStack=callStack@entry=0x7faccdd4c3a0, recurseDown=recurseDown@entry=true)
    at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/EstimateCodeSize.cpp:92
#33 0x00007fadf0c0487b in TR_MultipleCallTargetInliner::weighCallSite (this=0x7faccdd4d510, callStack=<optimized out>, callsite=0x7faccdd4d1e0, currentBlockHasExceptionSuccessors=<optimized out>, dontAddCalls=<optimized out>)
    at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/InlinerTempForJ9.cpp:3366
#34 0x00007fadf0c0d007 in TR_MultipleCallTargetInliner::inlineCallTargets (this=0x7faccdd4d510, callerSymbol=0x7facadc7a750, prevCallStack=0x0, innerPrexInfo=<optimized out>) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/InlinerTempForJ9.cpp:3087
#35 0x00007fadf0ef013b in TR_InlinerBase::performInlining (this=this@entry=0x7faccdd4d510, callerSymbol=callerSymbol@entry=0x7facadc7a750) at /home/thallium/openj9-openjdk-jdk20/omr/compiler/optimizer/Inliner.cpp:453
#36 0x00007fadf0c03dca in TR_Inliner::perform (this=0x7facadc850a0) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/optimizer/InlinerTempForJ9.cpp:2570
#37 0x00007fadf1008e45 in OMR::Optimizer::performOptimization (this=0x7facadd06f50, optimization=<optimized out>, firstOptIndex=<optimized out>, lastOptIndex=<optimized out>, doTiming=<optimized out>)
    at /home/thallium/openj9-openjdk-jdk20/omr/compiler/optimizer/OMROptimizer.cpp:2064
#38 0x00007fadf100920f in OMR::Optimizer::performOptimization (this=<optimized out>, optimization=<optimized out>, firstOptIndex=<optimized out>, lastOptIndex=2147483647, doTiming=0) at /home/thallium/openj9-openjdk-jdk20/omr/compiler/optimizer/OMROptimizer.cpp:1611
#39 0x00007fadf100ab93 in OMR::Optimizer::optimize (this=0x7facadd06f50) at /home/thallium/openj9-openjdk-jdk20/omr/compiler/optimizer/OMROptimizer.cpp:1128
#40 0x00007fadf0dee3fb in OMR::Compilation::performOptimizations (this=this@entry=0x7facadc75000) at /home/thallium/openj9-openjdk-jdk20/omr/compiler/compile/OMRCompilation.cpp:1267
#41 0x00007fadf0df61f5 in OMR::Compilation::compile (this=this@entry=0x7facadc75000) at /home/thallium/openj9-openjdk-jdk20/omr/compiler/compile/OMRCompilation.cpp:1064
#42 0x00007fadf09e2fa4 in TR::CompilationInfoPerThreadBase::compile (this=0x7facddd6d0b0, vmThread=<optimized out>, compiler=0x7facadc75000, compilee=<optimized out>, vm=..., optimizationPlan=<optimized out>, scratchSegmentProvider=...)
    at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/control/CompilationThread.cpp:9989
#43 0x00007fadf09e3fa6 in TR::CompilationInfoPerThreadBase::wrappedCompile (portLib=portLib@entry=0x7fadf1e21400 <j9portLibrary>, opaqueParameters=opaqueParameters@entry=0x7faccdd53700)
    at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/control/CompilationThread.cpp:9482
#44 0x00007fadf1a087a1 in omrsig_protect (portLibrary=0x7fadf1e21400 <j9portLibrary>, fn=0x7fadf09e3c40 <TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary*, void*)>, fn_arg=0x7faccdd53700,
    handler=0x7fadf09cc6a0 <jitSignalHandler(J9PortLibrary*, uint32_t, void*, void*)>, handler_arg=0x7facc8003c00, flags=505, result=0x7faccdd536d8) at /home/thallium/openj9-openjdk-jdk20/omr/port/unix/omrsignal.c:425
#45 0x00007fadf09e1ac2 in TR::CompilationInfoPerThreadBase::compile (this=0x7facddd6d0b0, vmThread=0x7facc8003c00, entry=0x7fadec317010, scratchSegmentProvider=...) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/control/CompilationThread.cpp:8467
#46 0x00007fadf09e1e4c in TR::CompilationInfoPerThread::processEntry (this=0x7facddd6d0b0, entry=..., scratchSegmentProvider=...) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/control/CompilationThread.cpp:4692
#47 0x00007fadf09e0cfb in TR::CompilationInfoPerThread::processEntries (this=0x7facddd6d0b0) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/control/CompilationThread.cpp:4397
#48 0x00007fadf09e1072 in TR::CompilationInfoPerThread::run (this=this@entry=0x7facddd6d0b0) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/control/CompilationThread.cpp:4232
#49 0x00007fadf09e1126 in protectedCompilationThreadProc (compInfoPT=compInfoPT@entry=0x7facddd6d0b0) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/control/CompilationThread.cpp:4166
#50 0x00007fadf1a087a1 in omrsig_protect (portLibrary=0x7fadf1e21400 <j9portLibrary>, fn=0x7fadf09e10a0 <protectedCompilationThreadProc(J9PortLibrary*, TR::CompilationInfoPerThread*)>, fn_arg=0x7facddd6d0b0, handler=0x7fadf1ba0940 <structuredSignalHandler>,
    handler_arg=0x7facc8003c00, flags=506, result=0x7faccdd54d08) at /home/thallium/openj9-openjdk-jdk20/omr/port/unix/omrsignal.c:425
#51 0x00007fadf09e14ff in compilationThreadProc (entryarg=0x7facddd6d0b0) at /home/thallium/openj9-openjdk-jdk20/openj9/runtime/compiler/control/CompilationThread.cpp:4071
#52 0x00007fadf1b4ec07 in thread_wrapper (arg=0x7fadec0ea850) at /home/thallium/openj9-openjdk-jdk20/omr/thread/common/omrthread.c:1724
#53 0x00007fadf1f6bb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#54 0x00007fadf1ffda00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

@babsingh
Copy link
Contributor

fyi @0xdaryl A segfault in the JIT is noticed while debugging this failure. Please see #16965 (comment) for more details.

@0xdaryl
Copy link
Contributor

0xdaryl commented May 16, 2023

@nbhuiyan : inliner crash above. Please investigate for 20.0.2.

@0xdaryl
Copy link
Contributor

0xdaryl commented May 16, 2023

It is probably worth opening a separate JIT issue for the seg fault to divorce it from the original issue being investigated, unless anyone feels otherwise.

@babsingh
Copy link
Contributor

babsingh commented May 16, 2023

probably worth opening a separate JIT issue for the seg fault

Original issue: #16965 (comment)
Inliner issue: #16965 (comment)

The original issue goes away with -Xint; ref: #16965 (comment). The original and inliner issue might be the same issue. In the original issue, an exception is thrown for the unexpected ScopedValue. The inliner crash occurs when the same unexpected ScopedValue is printed. We should do a preliminary analysis before diverging into two separate issues.

@thallium
Copy link
Contributor

findScopedValueBindings() is never called in this test and we don't have a test to validate the implementation.

@babsingh babsingh removed the comp:vm label May 30, 2023
@babsingh
Copy link
Contributor

Removed comp:vm based upon the above finding. This issue can focus on the JIT inliner failure.

@nbhuiyan
Copy link
Member

nbhuiyan commented Jun 5, 2023

@babsingh this issue might require some insights from the VM side. The reason why we see the problem during inlining is because of an invalid entry in the invokecache array for an invokedynamic bytecode. This problem is seen when trying to obtain the address of the appendix object, which returns 0x8, which is then dereferenced by the VM helper that does array of object loads, resulting in the segfault we see here. The JIT never modifies the entries in the invoke cache.

@nbhuiyan
Copy link
Member

With AOT disabled, and all optimizations beyond inlining disabled, opt level restricted to warm, and only this method below getting compiled, the failure is reproducible:

+ (warm) java/lang/ScopedValue$Carrier.runWith(Ljava/lang/ScopedValue$Snapshot;Ljava/lang/Runnable;)V @ 00007FFFDF693034-00007FFFDF695FE1 OrdinaryMethod - Q_SZ=3 Q_SZI=3 QW=24 j9m=0000000000255C60 bcsz=63 compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=49%

With only just changing a line in the test case to print the scoped value immediately prior to throwing the testFailureException results in the failure no longer occurring. This change that impacts the outcome of the test does not get compiled or inlined into the runWith method, which is the only method that gets compiled.

As a result, we can now look at fairly simple full compilation logs for passing and and a failing cases that result in identical compiled code (other than the relative addresses, symref numbers, etc):

  • Failing run - default
  • Passing run - only change was in StressStackOverflow.java to print out the scoped value immediately prior to throwing the testFailureException, and this change is not part of any compiled code.

Using a diff tool would reveal that the generated code for both passing and failing cases were identical, which suggests that the problem may lie outside of what the JIT compilations were responsible for.

As for the differences in the number of calls to the Thread.currentThread INL helper between a passing case and a failing case, all the calls near the point that the exception is being thrown originate from some interpreted method. I disabled recognizing Thread.currentThread in the JIT to ensure that the JIT also would call the INL helper or the fast JNI helper. I changed the helper code to increment a static counter for every time Thread.currentThread INL/JNI was being called, and raising SIGINT when we hit a call count close to what I observed for passing/failing runs, and ran the test in gdb and inspected the backtrace every time we ran into the SIGINT:

Thread 31 "ForkJoinPool-1-" received signal SIGINT, Interrupt.
__GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
50	in ../sysdeps/unix/sysv/linux/raise.c
(gdb) c
Continuing.

Thread 31 "ForkJoinPool-1-" received signal SIGINT, Interrupt.
__GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
50	in ../sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7ac6eee in VM_BytecodeInterpreterCompressed::inlThreadCurrentThread (
    _pc=<optimized out>, _sp=<optimized out>, this=<optimized out>)
    at /root/builds/openj9-openjdk-jdk21/openj9/runtime/vm/BytecodeInterpreter.hpp:2698
#2  VM_BytecodeInterpreterCompressed::run (this=0x2, this@entry=0x7fffdc4658e0,
    vmThread=0x0)
    at /root/builds/openj9-openjdk-jdk21/openj9/runtime/vm/BytecodeInterpreter.hpp:10656
#3  0x00007ffff7aae3f9 in bytecodeLoopCompressed (currentThread=<optimized out>)
    at /root/builds/openj9-openjdk-jdk21/openj9/runtime/vm/BytecodeInterpreter.inc:112
#4  0x00007ffff7bb8c72 in c_cInterpreter ()
    at /root/builds/openj9-openjdk-jdk21/build/linux-x86_64-server-release/vm/runtime/vm/xcinterp.s:158
#5  0x00007ffff7a2ef0a in runJavaThread (currentThread=0x7fffdc465a20,
    currentThread@entry=0x230b00)
    at /root/builds/openj9-openjdk-jdk21/openj9/runtime/vm/callin.cpp:682
#6  0x00007ffff7aa99a1 in javaProtectedThreadProc (
    portLibrary=portLibrary@entry=0x7ffff7cdc420 <j9portLibrary>,
    entryarg=entryarg@entry=0x230b00)
    at /root/builds/openj9-openjdk-jdk21/openj9/runtime/vm/vmthread.cpp:2104
#7  0x00007ffff78490f3 in omrsig_protect (
    portLibrary=0x7ffff7cdc420 <j9portLibrary>,
    fn=0x7ffff7aa9910 <javaProtectedThreadProc(J9PortLibrary*, void*)>,
    fn_arg=0x230b00, handler=0x7ffff7a51dc0 <structuredSignalHandler>,
    handler_arg=0x230b00, flags=506, result=0x7fffdc465dc8)
    at /root/builds/openj9-openjdk-jdk21/omr/port/unix/omrsignal.c:425
#8  0x00007ffff7aa599a in javaThreadProc (entryarg=0x7ffff000edf0)
    at /root/builds/openj9-openjdk-jdk21/openj9/runtime/vm/vmthread.cpp:383
#9  0x00007ffff79fe2b2 in thread_wrapper (arg=0x7ffff0917930)
    at /root/builds/openj9-openjdk-jdk21/omr/thread/common/omrthread.c:1724
#10 0x00007ffff7d7f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007ffff7edb133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@nbhuiyan
Copy link
Member

nbhuiyan commented Oct 6, 2023

@fengxue-IS as requested, here are the steps to reproduce what I did:

  1. echo the following into a a file, let's call it limitfile.txt:
+ (warm) java/lang/ScopedValue$Carrier.runWith(Ljava/lang/ScopedValue$Snapshot;Ljava/lang/Runnable;)V @ 00007FFFDF693034-00007FFFDF695FE1 OrdinaryMethod - Q_SZ=3 Q_SZI=3 QW=24 j9m=0000000000255C60 bcsz=63 compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=49%
  1. The test StressStackOverflow.java can be found in the jdk repo under test/jdk/java/lang/ScopedValue/. Compile it using a jdk21 build: javac --enable-preview --source 21 StressStackOverflow
  2. export TR_DisableRecognizeCurrentThread=1 to ensure JIT does not transform Thread.currentThread calls. Using this option has no effect on whether the test passes or fails.
  3. Run the test with just the runWith method compiled only once at warm, all optimizations after inlining disabled, and AOT disabled:
java --enable-preview -Xnoaot -XX:+ShowHiddenFrames -Xjit:'limitFile=(limitfile.txt,1,1),optLevel=warm,lastOptIndex=16' StressStackOverflow

The limitfile option requires that the file is in the current directory, or you would need to specify an absolute path. The parameters 1,1 specify the first and last line in the limit file to include for compilation, and for this case it is just the runWith method in the limit file.
5. To get the test to pass, just try to print the value within this catch block.

babsingh added a commit to babsingh/aqa-tests that referenced this issue Oct 19, 2023
Now, the test has four sub-variants, which need to be individually
excluded.

Related: eclipse-openj9/openj9#18065
Related: eclipse-openj9/openj9#16965

Signed-off-by: Babneet Singh <[email protected]>
llxia pushed a commit to adoptium/aqa-tests that referenced this issue Oct 19, 2023
Now, the test has four sub-variants, which need to be individually
excluded.

Related: eclipse-openj9/openj9#18065
Related: eclipse-openj9/openj9#16965

Signed-off-by: Babneet Singh <[email protected]>
@fengxue-IS
Copy link
Contributor

I did some testing with modified java code to init a new Exception in place rather than pre-init

Exception in thread "main" java.lang.RuntimeException: StressStackOverflow$TestFailureException: FINALLY-CODE  last=
	at StressStackOverflow.lambda$run$1(StressStackOverflow.java:213)
	at StressStackOverflow$$Lambda/0x0000000070bb9288.run(Unknown Source)
	at java.base/jdk.internal.vm.ScopedValueContainer.runWithoutScope(ScopedValueContainer.java:112)
	at java.base/jdk.internal.vm.ScopedValueContainer.run(ScopedValueContainer.java:98)
	at java.base/java.lang.ScopedValue$Carrier.runWith(ScopedValue.java:526)
	at java.base/java.lang.ScopedValue$Carrier.run(ScopedValue.java:510)
	at StressStackOverflow.run(StressStackOverflow.java:175)
	at StressStackOverflow.main(StressStackOverflow.java:226)
Caused by: StressStackOverflow$TestFailureException: FINALLY-CODE  last=
	at StressStackOverflow$DeepRecursion.run(StressStackOverflow.java:103)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:132)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad(StressStackOverflow.java:143)
	at StressStackOverflow$DeepRecursion.lambda$run$2(StressStackOverflow.java:83)
	at StressStackOverflow$DeepRecursion$$Lambda/0x0000000070bba7e0.run(Unknown Source)
	at java.base/jdk.internal.vm.ScopedValueContainer.doRun(ScopedValueContainer.java:131)
	at java.base/jdk.internal.vm.ScopedValueContainer.run(ScopedValueContainer.java:100)
	at java.base/java.lang.ScopedValue$Carrier.runWith(ScopedValue.java:526)
	at java.base/java.lang.ScopedValue$Carrier.run(ScopedValue.java:510)
	at StressStackOverflow$DeepRecursion.run(StressStackOverflow.java:83)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:132)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
	at StressStackOverflow.fibonacci_pad1(StressStackOverflow.java:135)
...

based on the exception location, it seems to suggest the issue is triggered only in finally blocks, looking into the corefile when the exception is generated, the value of el.get() (simulated manually) is 1 greater than last. And when adding printf on el.get() in finally block moved the exception throw location to https://github.com/ibmruntimes/openj9-openjdk-jdk21/blob/9e4783d1f1520d34f5261f8e94ec5d22004ade08/test/jdk/java/lang/ScopedValue/StressStackOverflow.java#L146.

This suggests that the way runWith is compiled may have incorrect handling on the finally block code https://github.com/ibmruntimes/openj9-openjdk-jdk21/blob/openj9/src/java.base/share/classes/java/lang/ScopedValue.java#L529 where the scoped value's unbind / rebind process didn't happen so the caller was still looking at the cache value that was suppose to be bounded with the ScopedValue.where() https://github.com/ibmruntimes/openj9-openjdk-jdk21/blob/9e4783d1f1520d34f5261f8e94ec5d22004ade08/test/jdk/java/lang/ScopedValue/StressStackOverflow.java#L82 call.

@nbhuiyan can you verify the code generated for the finally block please

@nbhuiyan
Copy link
Member

nbhuiyan commented Nov 7, 2023

The generated code for the finally block (which is actually split into two paths based on whether an exception was thrown or not) looks correct to me. However, with further testing by modifying the code, I found that the finally block may not always execute, which is not the correct behaviour. I modified runWith with the following:

        private void runWith(Snapshot newSnapshot, Runnable op) {
            boolean exceptionThrown = false;
            try {
                Thread.setScopedValueBindings(newSnapshot);
                Thread.ensureMaterializedForStackWalk(newSnapshot);
                ScopedValueContainer.run(op);
            } catch (Exception e) {
                System.out.println("Caught exception in runWith...");
                exceptionThrown = true;
                throw e;
            } finally {
                Reference.reachabilityFence(newSnapshot);
                Thread.setScopedValueBindings(newSnapshot.prev);
                Cache.invalidate(bitmask);
                if (exceptionThrown) System.out.println("Exception was caught and rethrown, and finally stuff was run.");
            }
        }

The above code should result in printing both the print statements in the catch block and the finally block each time an exception was thrown, but in my testing I found cases of the message in the catch block being printed without the message in the finally block, suggesting that somehow we skip running the the finally block code. Here is an example of an output that suggests the finally block may be skipped:

Caught exception in runWith...
Caught exception in runWith...
Caught exception in runWith...
Exception was caught and rethrown, and finally stuff was run.
Caught exception in runWith...
Exception was caught and rethrown, and finally stuff was run.

I am looking into what's causing this behaviour.

@0xdaryl
Copy link
Contributor

0xdaryl commented Nov 16, 2023

A couple of observations on this.

This test fails reliably at noopt and hasn't been reproduced with -Xint. Noopt failures are usually indicative of either a codegen problem, a timing problem somewhere (e.g., somewhere in the VM or a data race with the test case), or some other problem where a token JIT method is required somewhere on the stack.

This test fails readily on both x86 and AArch64 which makes it much less likely to be a codegen problem unless the IL is messed up (which I don't think it is).

I am always wary of OpenJDK tests that expect certain behaviour in the presence of forced stack overflows (mainly because they're tuned for OpenJDK behaviour and not OpenJ9). That doesn't mean the test is incorrect, but it does raise some suspicion for me. The behaviour of the test itself is driven in part by random choices within. I am presently removing those to produce a test with the most straight-line path to failure to focus the investigation.

@vij-singh
Copy link

@0xdaryl Any new news on pruning the test?

llxia pushed a commit to llxia/aqa-tests that referenced this issue Nov 22, 2023
Now, the test has four sub-variants, which need to be individually
excluded.

Related: eclipse-openj9/openj9#18065
Related: eclipse-openj9/openj9#16965

Signed-off-by: Babneet Singh <[email protected]>
llxia pushed a commit to llxia/aqa-tests that referenced this issue Nov 22, 2023
Now, the test has four sub-variants, which need to be individually
excluded.

Related: eclipse-openj9/openj9#18065
Related: eclipse-openj9/openj9#16965

Signed-off-by: Babneet Singh <[email protected]>
pshipton pushed a commit to adoptium/aqa-tests that referenced this issue Nov 22, 2023
Now, the test has four sub-variants, which need to be individually
excluded.

Related: eclipse-openj9/openj9#18065
Related: eclipse-openj9/openj9#16965

Signed-off-by: Babneet Singh <[email protected]>
@0xdaryl
Copy link
Contributor

0xdaryl commented Nov 24, 2023

Simplifying the test has helped focus the investigation on the ScopedValue/ScopedValueContainer implementations (as opposed to the test itself) which is where we're looking now. Analysis is proceeding at no-opt. The test fails readily at default no-opt but failures quickly go into remission when attempting to get more diagnostic info out or limiting it to a small subset of methods.

Since it fails at no-opt, no workaround is available (yet) other than disabling the JIT. I'm taking a much deeper look at the runWith family of methods when they are JITed as they seem to be the catalyst for reproducing these issues (our earlier investigation of them did not reveal anything wrong, but it is worth a deeper analysis).

@0xdaryl
Copy link
Contributor

0xdaryl commented Nov 28, 2023

I'm going through the ScopedValue and ScopedValueContainer code to understand what it is doing under the covers. Fortunately for this failure, only one thread is involved which simplifies the analysis. This failure has the feel of a data race given how it fails and how the problem can be made to go into remission with subtle changes to the test, but that seems unlikely with a single thread.

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 6, 2023

We now have a core file with a verbose stackwalk trace at the point the TestFailureException is thrown with all methods compiled at noopt that we are sorting through.

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 7, 2023

No major insights yet from the analysis.

In parallel, we attempted to collect logs and a core file from the original, unmodified test just to be sure we did not deviate too far from the original problem, but we have been unsuccessful in obtaining any failure artifacts on the original test. We'll continue to work with the reduced test.

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 12, 2023

This particular test relies on artificially induced stack overflows to exercise unwinding the stacked scopes and verifying this is done correctly. Stack overflows occur frequently in this test (it is a stress test after all!), which exercises the unwinding operation a lot. However, stack overflows can be difficult to control and may not occur where the test expects them to (or where OpenJDK VM would perform them). Their non-deterministic behaviour is fueled by the presence of JIT compiled methods that are compiled asynchronously and may have different frame shapes between compilations. If a stack overflow were to occur during a critical update operation that modifies the stack scopes then this could lead to inconsistent behaviour.

This isn't just a theory. For example, by instrumenting the class library, I have found instances where the ScopedValueContainer "push()" operation is interrupted by a stack overflow, thereby not completing its operation. While in that particular case I believe (through inspection) that the state should be consistent, it does raise the possibility that a stack overflow (which we know there are a great many) is occurring somewhere unexpected during a critical operation and leaving the representation of the stacked ScopedValues in an inconsistent state. Confirming this has been challenging because the problem tends to go into remission with even minor changes the the test or JCL.

Up to this point, for this failure I have found no evidence that the VM or JIT is misbehaving. Because this test is heavily exercising the ScopedValue feature in an unnaturally stressful way, and the fact that ScopedValues are a preview feature in JDK21, and the test itself is somewhat suspect, I don't think this should block JDK 21 release.

@pshipton
Copy link
Member

Moved it forward.

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 13, 2023

I should point out that only a single thread is needed to reproduce this behaviour. Virtual threads are not required either.

@hzongaro
Copy link
Member

Moving this to the "Future" milestone. It's something that needs to be looked at, but I suspect no one will be able to get to the bottom of it for an upcoming release.

@JamesKingdon
Copy link
Contributor

Did we discover why the finally block doesn't always run (as described in #16965 (comment)) ?

@nbhuiyan
Copy link
Member

@JamesKingdon I did not continue the investigation down that path to understand why we may be skipping the finally block, or whether this behaviour can be reproduced in any other test where the test is not artificially inducing lots of stack overflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests