Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmdLineTester_criu_nonPortableRestore_4_FAILED FAILED: testMXBeanUpTime() - uptimeAfterCheckpoint 9437 can't be greater than uptimeBeforeCheckpoint 7751 + 1500 #18384

Closed
JasonFengJ9 opened this issue Nov 1, 2023 · 4 comments · Fixed by #18425
Assignees
Labels
criu Used to track CRIU snapshot related work test failure

Comments

@JasonFengJ9
Copy link
Member

JasonFengJ9 commented Nov 1, 2023

Failure link

From an internal build(ubu22s390x-svl-rt6-1):

java version "21.0.1-beta" 2023-10-17
IBM Semeru Runtime Certified Edition 21.0.1+12-202310310039 (build 21.0.1-beta+12-202310310039)
Eclipse OpenJ9 VM 21.0.1+12-202310310039 (build master-a80c01fea, JRE 21 Linux s390x-64-Bit Compressed References 20231031_25 (JIT enabled, AOT enabled)
OpenJ9   - a80c01fea
OMR      - 94e44382f
JCL      - c06eaf638 based on jdk-21.0.1+12)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

[2023-10-31T02:08:19.555Z] variation: -Xjit:count=0 -XX:+CRIURestoreNonPortableMode
[2023-10-31T02:08:19.555Z] JVM_OPTIONS:  -Xjit:count=0 -XX:+CRIURestoreNonPortableMode 

[2023-10-31T02:11:12.649Z] Testing: Create CRIU checkpoint image and restore once - testMXBeanUpTime
[2023-10-31T02:11:12.649Z] Test start time: 2023/10/31 02:11:12 Coordinated Universal Time
[2023-10-31T02:11:12.649Z] Running command: bash /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/openjdkbinary/j2sdk-image/bin/java " -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu,j9jcl.533} --add-exports java.base/openj9.internal.criu=ALL-UNNAMED" org.openj9.criu.TimeChangeTest testMXBeanUpTime 1 false false
[2023-10-31T02:11:12.649Z] Time spent starting: 6 milliseconds
[2023-10-31T02:11:24.307Z] Time spent executing: 12248 milliseconds
[2023-10-31T02:11:24.307Z] Test result: FAILED
[2023-10-31T02:11:24.307Z] Output from test:
[2023-10-31T02:11:24.307Z]  [OUT] start running script
[2023-10-31T02:11:24.307Z]  [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
[2023-10-31T02:11:24.307Z]  [OUT] export LD_BIND_NOT=on
[2023-10-31T02:11:24.307Z]  [OUT] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/openjdkbinary/j2sdk-image/bin/java -XX:+EnableCRIUSupport  -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu,j9jcl.533} --add-exports java.base/openj9.internal.criu=ALL-UNNAMED -cp /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criu.jar org.openj9.criu.TimeChangeTest testMXBeanUpTime 1
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:19.881*0xa55800           j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698718279871) vmStartTime (1698718272120) criuTimeDeltaMillis (0)
[2023-10-31T02:11:24.307Z]  [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Tue Oct 31 02:11:20 UTC 2023, System.currentTimeMillis(): 1698718280557, System.nanoTime(): 1698718166740678368
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.411 0xa55800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.412 0xa55800          j9criu.18       - Taking a checkpoint with active clinit
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.510 0xa55800          j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1698718167693075406, checkpointNanoUTCTime = 1698718281510665061
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.674 0xa55800          j9criu.12       - Current syslogOptions: error,vital
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:21.676 0xa55800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1698718167859504964, j9time_current_time_nanos() returns 1698718281677095769
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:24.024 0xa55800          j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1698718170207451760, j9time_current_time_nanos() returns 1698718284025049925
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:24.025 0xa55800          j9criu.14       - After restore, restoreNanoUTCTime = 1698718284025049925, checkpointNanoUTCTime = 1698718281510665061, checkpointRestoreTimeDelta = 2514384864, restoreNanoTimeMonotonic = 1698718170207451760, checkpointNanoTimeMonotonic = 1698718167693075406, nanoTimeMonotonicClockDelta = 2514376354
[2023-10-31T02:11:24.307Z]  [OUT] JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:24.063 0xa55800          j9criu.10       < Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
[2023-10-31T02:11:24.307Z]  [OUT] 02:11:24.071 0xa55800           j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698718284071) vmStartTime (1698718272120) criuTimeDeltaMillis (2514)
[2023-10-31T02:11:24.307Z]  [OUT] FAILED: testMXBeanUpTime() - uptimeAfterCheckpoint 9437 can't be greater than uptimeBeforeCheckpoint 7751 + 1500
[2023-10-31T02:11:24.307Z]  [OUT] Removed test output files
[2023-10-31T02:11:24.307Z]  [OUT] finished script
[2023-10-31T02:11:24.307Z]  [ERR] /home/jenkins/workspace/Test_openjdk21_j9_sanity.functional_s390x_linux_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 41: 3076089 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
[2023-10-31T02:11:24.307Z] >> Success condition was found: [Output match: Killed]
[2023-10-31T02:11:24.307Z] >> Success condition was not found: [Output match: PASSED: testMXBeanUpTime()]
[2023-10-31T02:11:24.307Z] >> Failure condition was not found: [Output match: CRIU is not enabled]
[2023-10-31T02:11:24.307Z] >> Failure condition was not found: [Output match: Operation not permitted]
[2023-10-31T02:11:24.307Z] >> Failure condition was found: [Output match: FAILED: testMXBeanUpTime()]
[2023-10-31T02:11:24.307Z] >> Success condition was not found: [Output match: Thread pid mismatch]
[2023-10-31T02:11:24.307Z] >> Success condition was not found: [Output match: do not match expected]
[2023-10-31T02:11:24.307Z] >> Success condition was not found: [Output match: Unable to create a thread:]
[2023-10-31T02:11:24.307Z] >> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]
[2023-10-31T02:11:24.307Z] >> Failure condition was not found: [Output match: User requested Java dump using]

[2023-10-31T02:15:48.054Z] ---TEST RESULTS---
[2023-10-31T02:15:48.054Z] Number of PASSED tests: 55 out of 56
[2023-10-31T02:15:48.054Z] Number of FAILED tests: 1 out of 56
[2023-10-31T02:15:48.054Z] 
[2023-10-31T02:15:48.054Z] ---SUMMARY OF FAILED TESTS---
[2023-10-31T02:15:48.054Z] Create CRIU checkpoint image and restore once - testMXBeanUpTime
[2023-10-31T02:15:48.054Z] -----------------------------
[2023-10-31T02:15:48.054Z] 
[2023-10-31T02:15:48.054Z] -----------------------------------
[2023-10-31T02:15:48.054Z] cmdLineTester_criu_nonPortableRestore_4_FAILED

50x internal grinder - 2 failures reproduced at ubu20s390x-svl-rt2-1.

This is likely a test issue.

@JasonFengJ9 JasonFengJ9 added test failure criu Used to track CRIU snapshot related work labels Nov 1, 2023
@JasonFengJ9 JasonFengJ9 self-assigned this Nov 1, 2023
@r30shah
Copy link
Contributor

r30shah commented Nov 3, 2023

FYI @VermaSh

@VermaSh
Copy link
Contributor

VermaSh commented Nov 6, 2023

Looking into the failure @JasonFengJ9 is looking into the failure

@JasonFengJ9
Copy link
Member Author

j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698718279871) vmStartTime (1698718272120) criuTimeDeltaMillis (0)
j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1698718167693075406, checkpointNanoUTCTime = 1698718281510665061
j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1698718167859504964, j9time_current_time_nanos() returns 1698718281677095769
j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1698718170207451760, j9time_current_time_nanos() returns 1698718284025049925
j9criu.14       - After restore, restoreNanoUTCTime = 1698718284025049925, checkpointNanoUTCTime = 1698718281510665061, checkpointRestoreTimeDelta = 2514384864, restoreNanoTimeMonotonic = 1698718170207451760, checkpointNanoTimeMonotonic = 1698718167693075406, nanoTimeMonotonicClockDelta = 2514376354
j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698718284071) vmStartTime (1698718272120) criuTimeDeltaMillis (2514)
  • First RuntimeMXBean.getUptime() was taken at j9jcl.533 w/ 1698718279871ms via j9time_current_time_millis();
  • CRIUSupport.checkpointJVM() was at j9criu.13 w/ 1698718281510665061ns via j9time_current_time_nanos();
  • Actual criu_dump() was at j9criu.7 w/ 1698718281677095769ns via j9time_current_time_nanos();
  • The CRIU image is restored after 2s delay;
  • After criu_dump() was at j9criu.15 w/ 1698718284025049925ns via j9time_current_time_nanos();
  • After restore at j9criu.14, restoreNanoUTCTime was 1698718284025049925ns via j9time_current_time_nanos();
  • Second RuntimeMXBean.getUptime() was taken at j9jcl.533 w/ 1698718284071ms via j9time_current_time_millis().

The time between first RuntimeMXBean.getUptime() and CRIUSupport.checkpointJVM() was ~1639ms which is more than adjustMillis = 1500ms


This probably was due to the slow machine creating local directories.

Will modify the test to minimize the time gap between first RuntimeMXBean.getUptime() and CRIUSupport.checkpointJVM().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
criu Used to track CRIU snapshot related work test failure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants