-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MauveMultiThreadLoadTest_special_16 String.ConsCharset got 65503 but expected 223 #11192
Comments
The test, which failed converting some UTF8 bytes to a String.
|
Run some grinders https://ci.eclipse.org/openj9/job/Grinder/1174/ |
@gita-omr can someone take a look pls. |
@gita-omr this is failing enough to be a blocker, but it's running in a non-production mode. We need to evaluate if the problem is specific to the non-production options. |
Yes, we will take a look. @mnalam-p |
Trying another grinder on the OpenJ9 machines, using the latest build. |
@mnalam-p the grinder isn't done, but there are still failures https://ci.eclipse.org/openj9/job/Grinder_iteration_2/20/ |
btw, I suggested to Nazmul he try grinding using |
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1843 As this is being investigated and is known to fail frequently in grinders, I'll stop adding new comments for each failure in the nighty builds. https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64_aix_Nightly/1063 |
Started 5 iterations here: https://ci.eclipse.org/openj9/job/Grinder/1226/ |
Had to replace IP address in the CUSTOMIZED_SDK_URL: https://ci.eclipse.org/openj9/job/Grinder/1227/ |
@jdekonin still did not manage to make it run. Could you please help? |
@pshipton NodesByIteration was not able to reproduce the issue. Two config tested -
|
Where do you see that, I might want to change my answer. Something we build nightly or in a personal build is non-production. A build created for a release, with the correct properties, version numbers, and perhaps even building on special machines with restricted access is a production build. There would be no difference in functionality if built from the same SHAs, it's more about the Nightly build: Release build: |
Request access to an OpenJ9 machine where the problem occurs. |
@gita-omr CUSTOMIZED_SDK_URL has a few links that need modifying you only got the first which was the SDK. The second was the test-images.tar.gz. I relaunched with the correction Side note: @rajdeepsingh1 increased the artifactory space, so anything after November 19th won't matter it should be in both locations. I realize that doesn't help this issue though. |
@pshipton I ran 20 iteration of test.MauveMultiThreadLoadTest on an OpenJ9 machine (twice). All test passed and did not see any test failure. |
I can grind it again if you like. There were more failures last night. Did you choose a machine where we've seen failures before? |
5 iterations passed: https://ci.eclipse.org/openj9/job/Grinder/1230/console |
I think the above only ran MauveMultiThreadLoadTest_0 Started 1 iteration of MauveMultiThreadLoadTest_special_16 here: https://ci.eclipse.org/openj9/job/Grinder/1235/ |
Started 25 iterations of MauveMultiThreadLoadTest_special_16 here: https://ci.eclipse.org/openj9/job/Grinder/1236/ |
7 out of 100 failed (each iteration has 4 runs). |
Just wanted to mention that @mnalam-p was able to reproduce on openj9 machine manually and that the failure rate with count=0 seems to be pretty high. Narrowing down the method now. |
I have reduced the test cases to only the failing one (ConsCharset) and thus now can cycle test faster. Here is what I have found so far -
Any suggestion would be helpful on what to do next. |
With the AutoSIMD fix, still I was able to generate the failure locally. It is still kind of hit and miss to generate the failure. One observation I can see from the actual vs expected output failure is that the first three values are off by one byte. Also, I modified the suite to enable dump of failure and got a javacore and heapdump as well. I will try to see if I can get some clue from them or not. I am still trying to pinpoint the actual failure to consistently reproduce it. |
FYI a hang in the same test #11400 |
After reducing the test case, I still see the test failure. I am suspecting the problem is in
|
Started here: https://ci.eclipse.org/openj9/job/Grinder/1274/ This time with -Xjit=limit={*} since we suspect that even that option can change the behaviour. |
I was able to reproduce this issue with the oldest available nightly build from adoptopenjdk (17 April 2020).
|
While it would be nice to figure this out, obviously it doesn't need to be resolved for the 0.24 release, moving it forward. |
We've discovered a way to reliably reproduce this failure nearly 100% of the time. Use the following simplified test:
Invoke using
The issue occurs only when The optimization that causes this to start failing is Invoking the following will produce a trace file reliably:
Notably, |
Specifically, this starts failing at transformation 19.
|
Observed a different failure (
|
I was looking into this with @AlenBadel as he mentioned in #11192 (comment) where PRE decides to profile the array length and inserts placeholder JProfiling Call. With a small hack to make further narrow down the issue, we used a driver that limits the profiling values and used binary search to find out which was the last value for which it adds profiling trees and it starts failing. Following are key observations,
I will continue to help Alen and also take a look myself to find out the cause of the failures. |
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly/1868 |
@pshipton we only seen it on Power? Maybe we can set the label? |
Thanks a lot @AlenBadel and @r30shah for narrowing the test case down and the investigation. Just to update: since the test case fails practically every time now, we decided that @mnalam-p will try to step through it in the debugger. Whatever trees JPorifiler is inserting they should not be affecting the user data so I am also suspecting some codegen issue. But @r30shah if you could take another look at the inserted tree it would be very helpful. |
Sure, set any labels you like. |
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_testList_2/30 |
Further debugging reveals that the underlying array of the utf-8 encoded string was changed from 5 to 12. 12 being the initial byte array length. In fact, the String's underlying char has the same byte buffer as if it was encoded in ASCII instead of UTF-8. I am currently narrowing down on this issue (why it is not UTF-8 encoded). |
Based on gdb debugging and JCL source, it is clear that the string is wrongly encoded as 1-byte utf-8 for the bad case, thus resulting the same byte and length as source byte array. For the input buffer For the The generated IL for the byte to integer load is -
The byte value When the jProfiler is active, the generated IL is changed to -
This IL does not generates the |
When the jProfiler is enabled, it can change the IL from bloadi to bRegLoad. The b2iEvaluator in power does not generate extsb opcode for bRegLoad. This commit fixes that. Fixes: eclipse-openj9/openj9#11192 Signed-off-by: Mohammad Nazmul Alam <[email protected]>
When the jProfiler is enabled, it can change the IL from bloadi to bRegLoad. The b2iEvaluator in power does not generate extsb opcode for bRegLoad. This commit fixes that. Fixes: eclipse-openj9/openj9#11192 Signed-off-by: Mohammad Nazmul Alam <[email protected]>
When the jProfiler is enabled, it can change the IL from bloadi to bRegLoad. The b2iEvaluator in power does not generate extsb opcode for bRegLoad. This commit fixes that. Fixes: eclipse-openj9/openj9#11192 Signed-off-by: Mohammad Nazmul Alam <[email protected]>
The b2iEvaluator currently does not generate sign extension for bRegLoad. While it will work for sign extended bRegLoad, it may yield wrong result if the byte was not sign extended. This commit cover such cases, by making sure bRegLoad is always sign extended. Fixes: eclipse-openj9/openj9#11192 Signed-off-by: Mohammad Nazmul Alam <[email protected]>
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_ppc64le_linux_Nightly_mauveLoadTest/177
MauveMultiThreadLoadTest_special_16
variation: Mode555
JVM_OPTIONS: -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:counts=- - - - - - 1 1 1 1000 250 250 - - - 10000 100000 10000,gcOnResolve,rtResolve,sampleInterval=2,scorchingSampleThreshold=10000,quickProfile -Xcheck:gc:vmthreads:all:quiet
cent7-ppcle-4
The text was updated successfully, but these errors were encountered: