Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failed currentThread->publicFlags #16480

Closed
sbstnzmr opened this issue Dec 15, 2022 · 24 comments · Fixed by #17314
Closed

Assertion failed currentThread->publicFlags #16480

sbstnzmr opened this issue Dec 15, 2022 · 24 comments · Fixed by #17314

Comments

@sbstnzmr
Copy link

sbstnzmr commented Dec 15, 2022

Hi, we're using ibm-semeru-runtimes Docker image in an OpenShift envirnoment. While upgrading our runtime from Java 11 to Java 17 (openj9-0.35.0) in ran into the following error:

15:21:20:19.535 0x14400    j9vm.225    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-linux-x64-openj9/workspace/build/src/openj9/runtime/vm/VMAccess.cpp:1163: (!((0 != (((currentThread->publicFlags)) & ((0x1000))))))
root@dd3938c8330d:/# java --version
openjdk 17.0.5 2022-10-18
IBM Semeru Runtime Open Edition 17.0.5.0 (build 17.0.5+8)
Eclipse OpenJ9 VM 17.0.5.0 (build openj9-0.35.0, JRE 17 Linux amd64-64-Bit Compressed References 20221018_325 (JIT enabled, AOT enabled)
OpenJ9   - e04a7f6c1
OMR      - 85a21674f
JCL      - 32d2c409a33 based on jdk-17.0.5+8)

The error persisted while downgrading to a new Java 11 (open-11.0.17_8 with openj9-0.35.0) version.

00:17:40.585 0x14400    j9vm.225    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-linux-x64-openj9/workspace/build/src/openj9/runtime/vm/VMAccess.cpp:1163: (!((0 != (((currentThread->publicFlags)) & ((0x1000))))))
root@3f2836bdb121:/# java --version
openjdk 11.0.17 2022-10-18
IBM Semeru Runtime Open Edition 11.0.17.0 (build 11.0.17+8)
Eclipse OpenJ9 VM 11.0.17.0 (build openj9-0.35.0, JRE 11 Linux amd64-64-Bit Compressed References 20221031_559 (JIT enabled, AOT enabled)
OpenJ9   - e04a7f6c1
OMR      - 85a21674f
JCL      - a94c231303 based on jdk-11.0.17+8)

In the original container with an older Java 11 version (open-11.0.14.1_1 with openj9-0.30.1) this didn't happen.

root@e2d36f665825:/# java --version
openjdk 11.0.14.1 2022-02-08
IBM Semeru Runtime Open Edition 11.0.14.1 (build 11.0.14.1+1)
Eclipse OpenJ9 VM 11.0.14.1 (build openj9-0.30.1, JRE 11 Linux amd64-64-Bit Compressed References 20220210_379 (JIT enabled, AOT enabled)
OpenJ9   - 9dccbe076
OMR      - 56c3376ba
JCL      - 5c423dacd0 based on jdk-11.0.14.1+1)

I'm not sure which additional information you would need to investigate such an issue. The error happens infrequently in about 1% of containers of a deployment.

@pshipton
Copy link
Member

We'll either need a system core file (core*.dmp) plus the other diagnostic files that should have been produced by the assert, or a way to reproduce it.

@gacholio
Copy link
Contributor

Assert_VM_false(J9_ARE_ANY_BITS_SET(currentThread->publicFlags, J9_PUBLIC_FLAGS_NOT_AT_SAFE_POINT));

@gacholio
Copy link
Contributor

We definitely need to see a native stacktrace for this - this indicates that we're going for safepoint VM access when not at a safe point (and in a native no less, which is always a safe point).

@gacholio
Copy link
Contributor

I'm mistaken - the issue isn't with the running thread, it's with a different thread, but it still indicates the problem thread is both in a native and not at a safe point.

We really need a core with all native threads included, though the java stacks from the assertion core would be a good start.

@sbstnzmr
Copy link
Author

Thanks for all the answers. I'll try to get the information.

@gacholio
Copy link
Contributor

If the problem can be repeated manually, I suggest running with -Xrs which will prevent the VM from hooking any signals, so the crash will result in a dump with all threads included.

@sbstnzmr
Copy link
Author

javacore.20221220.120141.3981.0002.txt I could get a core dump. The execution included -Xrs as parameter.

@gacholio
Copy link
Contributor

We're really looking for the system core. I guess -Xrs doesn't work in this case because it's an assertion and not a crash.

@pshipton @keithc-ca Any way of getting a full core (all native threads) from an assertion?

@pshipton
Copy link
Member

Would this help in combination with -Xrs, "-Xtrace:trigger=tpnid{j9vm.225,segv}", to cause a segv when the assertion tracepoint triggers?

@gacholio
Copy link
Contributor

Yeah, triggering SEGV with -Xrs should get us the results we're looking for.

@gruenj
Copy link

gruenj commented Dec 28, 2022

hi, we have issues creating the system core dump, due to restrictions in the environment.

We also tried the following versions:
openj9-0.33.1 – no error
openj9-0.36.0 - Release Build 56 (#16480 (comment)) – same error

Does the output of the following gdb script help to narrow down the cause?

gdb.script

set print thread-events off
set $n = 100
while $n-- > 0
  printf "%d program: START\n", $n
  run
  if $_siginfo
    printf "Received signal %d, stopping\n", $_siginfo.si_signo

    printf "---info threads---\n"
    info threads
    printf "---info threads---\n"

    printf "---thread apply all bt---\n"
    thread apply all bt
    printf "---thread apply all bt---\n"

    printf "---thread apply all bt -full---\n"
    thread apply all bt -full
    printf "---thread apply all bt -full---\n"

    loop_break
  else
    printf "%d program: END\n", $n
  end
end

Command:
gdb -x /app/gdb.script --args /opt/java/openjdk/bin/java -Xmx1g -Xrs -Xtrace:trigger=tpnid\{j9vm.225,segv\} -Dlogging.level.root=OFF -Dspring.main.banner-mode=off -Xdump:directory=/tmp/app -jar /app/app.jar

Result:
gdb.log

@gacholio
Copy link
Contributor

gacholio commented Jan 3, 2023

From the log, it looks like thread 2 (0x16f00) is going for safepoint exclusive due to final field modification, and thread 34 (0x27b800) is the problematic thread.

@gacholio
Copy link
Contributor

gacholio commented Jan 3, 2023

Please try a run with -XX:-OSRSafePoint to see if we get a different error or the problem goes away.

@gacholio
Copy link
Contributor

gacholio commented Jan 3, 2023

Thread 34

#4  0x00007ffff738e837 in omrthread_monitor_enter_using_threadId (monitor=monitor@entry=0x7ffff1397608, threadId=threadId@entry=0x7fff5c001020) at /home/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-linux-x64-openj9/workspace/build/src/omr/thread/common/omrthread.c:3875
No locals.
#5  0x00007ffff74b4951 in internalEnterVMFromJNI (currentThread=0x27b800) at /home/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-linux-x64-openj9/workspace/build/src/openj9/runtime/vm/VMAccess.cpp:1059
        publicFlagsMutex = 0x7ffff1397608
        osThread = 0x7fff5c001020
#6  0x00007ffff60b9ec3 in jvmtiSetTag (env=0x7ffff00d4998, object=0x2eda50, tag=2225002374569854208)

openj9/runtime/vm/VMAccess.cpp

Lines 1051 to 1059 in a2475ea

void
internalEnterVMFromJNI(J9VMThread *currentThread)
{
currentThread->inNative = FALSE;
VM_AtomicSupport::readWriteBarrier(); // necessary?
if (J9_UNEXPECTED(currentThread->publicFlags != J9_PUBLIC_FLAGS_VM_ACCESS)) {
omrthread_monitor_t const publicFlagsMutex = currentThread->publicFlagsMutex;
omrthread_t const osThread = currentThread->osThread;
omrthread_monitor_enter_using_threadId(publicFlagsMutex, osThread);

Thread 2

#7  0x00007ffff74b5013 in acquireSafePointVMAccess (vmThread=0x16f00) at /home/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-linux-x64-openj9/workspace/build/src/openj9/runtime/vm/VMAccess.cpp:1163
#8  0x00007ffff74a1c58 in VM_VMHelpers::reportFinalFieldModified (fieldClass=0xd6ad00, 

openj9/runtime/vm/VMAccess.cpp

Lines 1161 to 1163 in a2475ea

if (currentThread->inNative) {
TRIGGER_J9HOOK_VM_ACQUIRING_EXCLUSIVE_IN_NATIVE(vm->hookInterface, vmThread, currentThread);
Assert_VM_false(J9_ARE_ANY_BITS_SET(currentThread->publicFlags, J9_PUBLIC_FLAGS_NOT_AT_SAFE_POINT));

@gruenj
Copy link

gruenj commented Jan 4, 2023

With -XX:-OSRSafePoint the problem goes away.

Command used:
/opt/java/openjdk/bin/java -Xmx1g -Xrs -XX:-OSRSafePoint -Xtrace:trigger=tpnid\{j9vm.225,segv\} -Dlogging.level.root=OFF -Dspring.main.banner-mode=off -Xdump:directory=/tmp/app -jar /app/app.jar

@gruenj
Copy link

gruenj commented Apr 3, 2023

Hi @gacholio @pshipton ,
any update on this issue?

The latest OpenJ9 project builds does still produce the same error.

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Build_JDK17_x86-64_linux_Release/62/
(from https://openj9-jenkins.osuosl.org/job/Build_JDK17_x86-64_linux_Release/62/)

openjdk version "17.0.7-internal" 2023-04-18
OpenJDK Runtime Environment (build 17.0.7-internal+0-adhoc.jenkins.BuildJDK17x86-64linuxRelease)
Eclipse OpenJ9 VM (build v0.38.0-release-08d40c01b59, JRE 17 Linux amd64-64-Bit Compressed References 20230401_62 (JIT enabled, AOT enabled)
OpenJ9   - 08d40c01b59
OMR      - 855813495c1
JCL      - 2be39ca26d5 based on jdk-17.0.7+5)


0x17000 j9vm.225 * ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK17_x86-64_linux_Release/openj9/runtime/vm/VMAccess.cpp:1163: (!((0 != (((currentThread->publicFlags)) & ((0x1000))))))

We build openj9 from source and were able to find the problematic commits.

The error first occurs with: abfb530

Eclipse OpenJ9 VM (build master-abfb5304d, JRE 17 Linux amd64-64-Bit Compressed References 20230327_000000 (JIT enabled, AOT enabled)
OpenJ9   - abfb5304d
OMR      - 453e06ac2
JCL      - a7b6b615949 based on jdk-17.0.4+7)

The error does not occur with: a1f1df2

Eclipse OpenJ9 VM (build master-a1f1df2b3, JRE 17 Linux amd64-64-Bit Compressed References 20230327_000000 (JIT enabled, AOT enabled)
OpenJ9   - a1f1df2b3
OMR      - 453e06ac2
JCL      - a7b6b615949 based on jdk-17.0.4+7)

The error occurs again with: 9a795f1

OpenJDK Runtime Environment (build 17.0.4-internal+0-adhoc..openj9-openjdk-jdk17)
Eclipse OpenJ9 VM (build master-9a795f121, JRE 17 Linux amd64-64-Bit Compressed References 20230327_000000 (JIT enabled, AOT enabled)
OpenJ9   - 9a795f121
OMR      - 453e06ac2
JCL      - a7b6b615949 based on jdk-17.0.4+7)

@pshipton
Copy link
Member

pshipton commented Apr 3, 2023

@amicic @gacholio the problem has been narrowed down to 9a795f121
#15227
It also occurred with #15201

@gacholio
Copy link
Contributor

gacholio commented Apr 6, 2023

@gruenj Can you provide instructions/artifacts to allow me to reproduce the issue locally?

@gruenj
Copy link

gruenj commented Apr 16, 2023

@gacholio we finally find a way to produce the issue locally.
see https://github.com/gruenj/openj9-bug-16480 for code and pre build artifacts

To reproduce the issue a native agent library (-agentpath) is required

  • The native agent has to register a callback for the JVMTI_EVENT_SAMPLED_OBJECT_ALLOC event
  • The native agent is also required to start a thread in which data is allocated
  • The Java code is a Spring Application with @Configuration classes

@gacholio
Copy link
Contributor

I think the key here is the JVMTI_EVENT_SAMPLED_OBJECT_ALLOC. This event is currently dispatched while the NOT_AT_SAFE_POINT bit is set in the allocator.

The simplest fix would be to disable safepoint OSR if this event is enabled (much as we do for the instrumentable allocate extension). This will result in slightly lower performance from the compiled code.

@gacholio
Copy link
Contributor

@gruenj Can you do a performance test with -XX:-OSRSafePoint to see if this is acceptable?

@gacholio
Copy link
Contributor

gacholio commented Apr 21, 2023

The proposed solution would also mean that the can_generate_sampled_object_alloc_events capability must be acquired during the OnLoad phase.

@gacholio
Copy link
Contributor

gacholio commented May 1, 2023

Based on the JVMTI spec, it seems capabilities are almost always acquired during the OnLoad phase, so I'm going to go ahead and prototype my suggested change as I don't see any other practical way to address this issue.

gacholio added a commit to gacholio/openj9 that referenced this issue May 1, 2023
Use of the can_generate_sampled_object_alloc_events JVMTI capability
must disable safe point OSR because the event would be sent when not at
an OSR safe point, leading to assertions or hangs.

Fixes: eclipse-openj9#16480

Signed-off-by: Graham Chapman <[email protected]>
yathamravali pushed a commit to ibmruntimes/semeru-openj9 that referenced this issue Jul 4, 2023
Use of the can_generate_sampled_object_alloc_events JVMTI capability
must disable safe point OSR because the event would be sent when not at
an OSR safe point, leading to assertions or hangs.

Fixes: eclipse-openj9#16480

Signed-off-by: Graham Chapman <[email protected]>
yathamravali pushed a commit to ibmruntimes/semeru-openj9 that referenced this issue Jul 4, 2023
Use of the can_generate_sampled_object_alloc_events JVMTI capability
must disable safe point OSR because the event would be sent when not at
an OSR safe point, leading to assertions or hangs.

Fixes: eclipse-openj9#16480

Signed-off-by: Graham Chapman <[email protected]>
yathamravali pushed a commit to ibmruntimes/semeru-openj9 that referenced this issue Jul 4, 2023
Use of the can_generate_sampled_object_alloc_events JVMTI capability
must disable safe point OSR because the event would be sent when not at
an OSR safe point, leading to assertions or hangs.

Fixes: eclipse-openj9#16480

Signed-off-by: Graham Chapman <[email protected]>
yathamravali pushed a commit to ibmruntimes/semeru-openj9 that referenced this issue Jul 4, 2023
Use of the can_generate_sampled_object_alloc_events JVMTI capability
must disable safe point OSR because the event would be sent when not at
an OSR safe point, leading to assertions or hangs.

Fixes: eclipse-openj9#16480

Signed-off-by: Graham Chapman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants