Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmdLineTester_criu_nonPortableRestoreJDK11Up testObjectWaitTimedV2 actual elapsed time does not match expected #16907

Closed
pshipton opened this issue Mar 14, 2023 · 10 comments
Assignees
Labels
comp:vm criu Used to track CRIU snapshot related work test failure

Comments

@pshipton
Copy link
Member

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/129
cmdLineTester_criu_nonPortableRestoreJDK11Up_0

Testing: Create CRIU checkpoint image and restore once - testObjectWaitTimedV2
Test start time: 2023/03/14 04:42:42 Atlantic Standard Time
Running command: bash /home/jenkins/workspace/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/openjdkbinary/j2sdk-image/bin/java " -Xjit -XX:+CRIURestoreNonPortableMode  --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=java.base/openj9.internal.criu=ALL-UNNAMED" "org.openj9.criu.JDK11UpTimeoutAdjustmentTest testObjectWaitTimedV2" 1 1 false
Time spent starting: 7 milliseconds
Time spent executing: 25256 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] Start test name: testObjectWaitTimedV2
 [OUT] Before starting testObjectWaitTimedV2, current thread name: main, Tue Mar 14 04:42:43 ADT 2023, System.currentTimeMillis(): 1678779763312, System.nanoTime(): 13009006661565022
 [OUT] testObjectWaitTimedV2() before wait(ms), current thread name: Thread-0, Tue Mar 14 04:42:43 ADT 2023, System.currentTimeMillis(): 1678779763419, System.nanoTime(): 13009006768139721
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Tue Mar 14 04:42:53 ADT 2023, System.currentTimeMillis(): 1678779773263, System.nanoTime(): 13009016611877784
 [OUT] testObjectWaitTimedV2() after wait(ms), current thread name: Thread-0, Tue Mar 14 04:42:55 ADT 2023, System.currentTimeMillis(): 1678779775877, System.nanoTime(): 13009016657576039
 [OUT] FAILED: expected wait time 10000 ms, but the actual elapsed time was: 9888677222ms with startNanoTime = 13009006768565670, and endNanoTime = 13009016657242892, CheckpointRestoreNanoTimeDelta: 2569157133
 [OUT] End testObjectWaitTimedV2, current thread name: main, Tue Mar 14 04:43:07 ADT 2023, System.currentTimeMillis(): 1678779787870, System.nanoTime(): 13009028649761469
 [OUT] Removed testOutput file
 [OUT] finished script
 [ERR] /home/jenkins/workspace/Test_openjdk19_j9_sanity.functional_x86-64_linux_Nightly_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 40: 13541 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
>> Success condition was not found: [Output match: PASSED: expected wait time]
>> Required condition was found: [Output match: Killed]
>> Required condition was found: [Output match: Start test name: testObjectWaitTimedV2]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Failure condition was found: [Output match: FAILED: expected wait time]
>> Failure condition was not found: [Output match: InterruptedException]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]
@pshipton pshipton added comp:vm test failure criu Used to track CRIU snapshot related work labels Mar 14, 2023
@pshipton
Copy link
Member Author

@tajila fyi

@tajila
Copy link
Contributor

tajila commented Mar 14, 2023

@JasonFengJ9 Please take a look

@JasonFengJ9
Copy link
Member

[OUT] FAILED: expected wait time 10000 ms, but the actual elapsed time was: 9888677222ms with startNanoTime = 13009006768565670, and endNanoTime = 13009016657242892, CheckpointRestoreNanoTimeDelta: 2569157133

There was a mismatching ms and ns which is a test problem.

showMessages("FAILED: expected wait time ", msSleepTime10s, true, elapsedTime, startNanoTime,
endNanoTime);

The actual elapsed time was 9888ms compared with the target 10000ms, will check System.nanoTime() and Object.wait() time compensation across C/R.

@pshipton
Copy link
Member Author

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/424
cmdLineTester_criu_nonPortableRestoreJDK11Up_2 -Xjit:count=0 -XX:+CRIURestoreNonPortableMode

Testing: Create CRIU checkpoint image and restore once - testObjectWaitTimedV2
Test start time: 2023/04/10 15:38:00 Eastern Standard Time
Running command: bash /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/openjdkbinary/j2sdk-image/bin/java " -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu} --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=java.base/openj9.internal.criu=ALL-UNNAMED" "org.openj9.criu.JDK11UpTimeoutAdjustmentTest testObjectWaitTimedV2" 1 1 false
Time spent starting: 6 milliseconds
Time spent executing: 30712 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] Start test name: testObjectWaitTimedV2
 [OUT] Before starting testObjectWaitTimedV2, current thread name: main, Mon Apr 10 15:38:02 EDT 2023, System.currentTimeMillis(): 1681155482818, System.nanoTime(): 1681155482809075038
 [OUT] testObjectWaitTimedV2 NO C/R before wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:03 EDT 2023, System.currentTimeMillis(): 1681155483278, System.nanoTime(): 1681155483269276042
 [OUT] testObjectWaitTimedV2 NO C/R after wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:13 EDT 2023, System.currentTimeMillis(): 1681155493283, System.nanoTime(): 1681155493274886670
 [OUT] FAILED: expected wait time 10000500000 ns, but the actual elapsed time was: 10000499781ns (~10000ms) with startNanoTime = 1681155483271517053ns, and endNanoTime = 1681155493272016834ns, CheckpointRestoreNanoTimeDelta: 0ns (~0ms)
 [OUT] testObjectWaitTimedV2 before wait(10000, 500000), current thread name: Thread-0, Mon Apr 10 15:38:13 EDT 2023, System.currentTimeMillis(): 1681155493344, System.nanoTime(): 1681155493335445898
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Mon Apr 10 15:38:15 EDT 2023, System.currentTimeMillis(): 1681155495988, System.nanoTime(): 1681155495979505548
 [OUT] 19:38:15.994*0x12d9800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] 19:38:15.996 0x12d9800          j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1681155495987701688, checkpointNanoUTCTime = 1681155495997125820
 [OUT] 19:38:16.013 0x12d9800          j9criu.12       - Current syslogOptions: error,vital
 [OUT] 19:38:16.013 0x12d9800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1681155496004096889, j9time_current_time_nanos() returns 1681155496013521069
 [OUT] 19:38:18.090 0x12d9800          j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1681155498081639212, j9time_current_time_nanos() returns 1681155498091065442
 [OUT] 19:38:18.090 0x12d9800          j9criu.14       - After restore, restoreNanoUTCTime = 1681155498091065442, checkpointNanoUTCTime = 1681155495997125820, checkpointRestoreTimeDelta = 2093939622, restoreNanoTimeMonotonic = 1681155498081639212, checkpointNanoTimeMonotonic = 1681155495987701688, nanoTimeMonotonicClockDelta = 2093937524
 [OUT] 19:38:18.093 0x12d9800          j9criu.10       < Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] testObjectWaitTimedV2 before wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:18 EDT 2023, System.currentTimeMillis(): 1681155498093, System.nanoTime(): 1681155495991015257
 [OUT] testObjectWaitTimedV2 after wait(10000, 500000), current thread name: Thread-0, Mon Apr 10 15:38:25 EDT 2023, System.currentTimeMillis(): 1681155505455, System.nanoTime(): 1681155503352646184
 [OUT] PASSED: expected wait time 10000500000 ns, but the actual elapsed time was: 10000620129ns (~10000ms) with startNanoTime = 1681155493335825704ns, and endNanoTime = 1681155503336445833ns, CheckpointRestoreNanoTimeDelta: 2093939622ns (~2093ms)
 [OUT] testObjectWaitTimedV2 after wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:28 EDT 2023, System.currentTimeMillis(): 1681155508096, System.nanoTime(): 1681155505993630294
 [OUT] PASSED: expected wait time 10000500000 ns, but the actual elapsed time was: 10000653991ns (~10000ms) with startNanoTime = 1681155495992900937ns, and endNanoTime = 1681155505993554928ns, CheckpointRestoreNanoTimeDelta: 2093939622ns (~2093ms)
 [OUT] End testObjectWaitTimedV2, current thread name: main, Mon Apr 10 15:38:30 EDT 2023, System.currentTimeMillis(): 1681155510628, System.nanoTime(): 1681155508525818197
 [OUT] Removed testOutput file
 [OUT] finished script
 [ERR] /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 40: 14534 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
>> Success condition was found: [Output match: PASSED: expected wait time]
>> Required condition was found: [Output match: Killed]
>> Required condition was found: [Output match: Start test name: testObjectWaitTimedV2]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Failure condition was found: [Output match: FAILED: expected wait time]
>> Failure condition was not found: [Output match: InterruptedException]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]

@pshipton pshipton changed the title cmdLineTester_criu_nonPortableRestoreJDK11Up testObjectWaitTimedV2 wait time too large cmdLineTester_criu_nonPortableRestoreJDK11Up testObjectWaitTimedV2 actual wait time does not match expected Apr 10, 2023
@pshipton pshipton changed the title cmdLineTester_criu_nonPortableRestoreJDK11Up testObjectWaitTimedV2 actual wait time does not match expected cmdLineTester_criu_nonPortableRestoreJDK11Up testObjectWaitTimedV2 actual elapsed time does not match expected Apr 10, 2023
@JasonFengJ9
Copy link
Member

 [OUT] Before starting testObjectWaitTimedV2, current thread name: main, Mon Apr 10 15:38:02 EDT 2023, System.currentTimeMillis(): 1681155482818, System.nanoTime(): 1681155482809075038
 [OUT] testObjectWaitTimedV2 NO C/R before wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:03 EDT 2023, System.currentTimeMillis(): 1681155483278, System.nanoTime(): 1681155483269276042
 [OUT] testObjectWaitTimedV2 NO C/R after wait(10000, 500000), current thread name: main, Mon Apr 10 15:38:13 EDT 2023, System.currentTimeMillis(): 1681155493283, System.nanoTime(): 1681155493274886670
 [OUT] FAILED: expected wait time 10000500000 ns, but the actual elapsed time was: 10000499781ns (~10000ms) with startNanoTime = 1681155483271517053ns, and endNanoTime = 1681155493272016834ns, CheckpointRestoreNanoTimeDelta: 0ns (~0ms)

This failure occurred before checkpoint, ~219ns less than the expected elapsed time.

@pshipton
Copy link
Member Author

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/519/
cmdLineTester_criu_nonPortableRestoreJDK11Up_2 -Xjit:count=0 -XX:+CRIURestoreNonPortableMode

Testing: Create CRIU checkpoint image and restore once - testObjectWaitTimedV1
Test start time: 2023/04/20 07:37:19 Coordinated Universal Time
Running command: bash /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/openjdkbinary/j2sdk-image/bin/java " -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu} --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED --add-exports=java.base/openj9.internal.criu=ALL-UNNAMED" "org.openj9.criu.JDK11UpTimeoutAdjustmentTest testObjectWaitTimedV1" 1 1 false
Time spent starting: 7 milliseconds
Time spent executing: 43554 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] Start test name: testObjectWaitTimedV1
 [OUT] Before starting testObjectWaitTimedV1, current thread name: main, Thu Apr 20 07:37:23 UTC 2023, System.currentTimeMillis(): 1681976243312, System.nanoTime(): 1681976243307194105
 [OUT] testObjectWaitTimedV1 NO C/R before wait(10000, 0), current thread name: main, Thu Apr 20 07:37:24 UTC 2023, System.currentTimeMillis(): 1681976244849, System.nanoTime(): 1681976244844687865
 [OUT] testObjectWaitTimedV1 NO C/R after wait(10000, 0), current thread name: main, Thu Apr 20 07:37:34 UTC 2023, System.currentTimeMillis(): 1681976254879, System.nanoTime(): 1681976254874093625
 [OUT] PASSED: expected wait time 10000000000 ns, but the actual elapsed time was: 10000290980ns (~10000ms) with startNanoTime = 1681976244867283593ns, and endNanoTime = 1681976254867574573ns, CheckpointRestoreNanoTimeDelta: 0ns (~0ms)
 [OUT] testObjectWaitTimedV1 before wait(10000, 0), current thread name: Thread-0, Thu Apr 20 07:37:35 UTC 2023, System.currentTimeMillis(): 1681976255196, System.nanoTime(): 1681976255191475431
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Thu Apr 20 07:37:43 UTC 2023, System.currentTimeMillis(): 1681976263034, System.nanoTime(): 1681976263029673537
 [OUT] 07:37:43.096*0x138800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] 07:37:43.106 0x138800          j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1681976263101990092, checkpointNanoUTCTime = 1681976263107223156
 [OUT] 07:37:43.158 0x138800          j9criu.12       - Current syslogOptions: error,vital
 [OUT] 07:37:43.158 0x138800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1681976263153541509, j9time_current_time_nanos() returns 1681976263158774448
 [OUT] 07:37:46.956 0x138800          j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1681976266951114284, j9time_current_time_nanos() returns 1681976266956351514
 [OUT] 07:37:46.956 0x138800          j9criu.14       - After restore, restoreNanoUTCTime = 1681976266956351514, checkpointNanoUTCTime = 1681976263107223156, checkpointRestoreTimeDelta = 3849128358, restoreNanoTimeMonotonic = 1681976266951114284, checkpointNanoTimeMonotonic = 1681976263101990092, nanoTimeMonotonicClockDelta = 3849124192
 [OUT] 07:37:46.961 0x138800          j9criu.10       < Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] testObjectWaitTimedV1 before wait(10000, 0), current thread name: main, Thu Apr 20 07:37:46 UTC 2023, System.currentTimeMillis(): 1681976266961, System.nanoTime(): 1681976263107596407
 [OUT] testObjectWaitTimedV1 after wait(10000, 0), current thread name: Thread-0, Thu Apr 20 07:37:46 UTC 2023, System.currentTimeMillis(): 1681976266967, System.nanoTime(): 1681976263112892820
 [OUT] FAILED: expected wait time 10000000000 ns, but the actual elapsed time was: 7920362661ns (~7920ms) with startNanoTime = 1681976255192508834ns, and endNanoTime = 1681976263112871495ns, CheckpointRestoreNanoTimeDelta: 3849128358ns (~3849ms)
 [OUT] testObjectWaitTimedV1 after wait(10000, 0), current thread name: main, Thu Apr 20 07:37:56 UTC 2023, System.currentTimeMillis(): 1681976276975, System.nanoTime(): 1681976273121451196
 [OUT] PASSED: expected wait time 10000000000 ns, but the actual elapsed time was: 10002690947ns (~10002ms) with startNanoTime = 1681976263118695139ns, and endNanoTime = 1681976273121386086ns, CheckpointRestoreNanoTimeDelta: 3849128358ns (~3849ms)
 [OUT] End testObjectWaitTimedV1, current thread name: main, Thu Apr 20 07:38:02 UTC 2023, System.currentTimeMillis(): 1681976282047, System.nanoTime(): 1681976278193676134
 [OUT] Removed testOutput file
 [OUT] finished script
 [ERR] /home/jenkins/workspace/Test_openjdk11_j9_sanity.functional_s390x_linux_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh: line 40: 2627005 Killed                  $2 -XX:+EnableCRIUSupport $3 -cp "$1/criu.jar" $4 $5 $6 > testOutput 2>&1
>> Success condition was found: [Output match: PASSED: expected wait time]
>> Required condition was found: [Output match: Killed]
>> Required condition was found: [Output match: Start test name: testObjectWaitTimedV1]
>> Failure condition was not found: [Output match: CRIU is not enabled]
>> Failure condition was not found: [Output match: Operation not permitted]
>> Failure condition was found: [Output match: FAILED: expected wait time]
>> Failure condition was not found: [Output match: InterruptedException]
>> Success condition was not found: [Output match: Unable to create a thread:]
>> Success condition was not found: [Output match: Thread pid mismatch]
>> Success condition was not found: [Output match: do not match expected]
>> Failure condition was not found: [Output match: Could not dump the JVM processes, err=-70]

@keithc-ca
Copy link
Contributor

keithc-ca commented May 11, 2023

@tajila
Copy link
Contributor

tajila commented Aug 15, 2023

@JasonFengJ9 does this still fail?

@JasonFengJ9
Copy link
Member

@tajila It didn't appear in recent internal builds.

Most of the occurrences were reported from open runs.
@pshipton any recent observation?

@pshipton
Copy link
Member Author

any recent observation?

Any recent failures would have been added to the appropriate issues (except during my vacation times), so no.

@tajila tajila closed this as completed Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:vm criu Used to track CRIU snapshot related work test failure
Projects
None yet
Development

No branches or pull requests

4 participants