-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple tests from jdk_jfr_2 fail on arm32 due to dump related issues #3115
Comments
I've been investigating this problem a little. Here's what I've found: Root causeCompiler optimizations in hotspot/src/cpu/aarch32/vm/bytes_aarch32.hpp break things when building for aarch32 in aarch32 containers running on a aarch64 host. The result is a SIGBUS address alignment error (as shown in the issue description above). SolutionUpgrade the code in bytes_aarch32.hpp to the newer implementation in JDK11/JDK17 (recommended). Or turn off optimizations for that file (not recommended). ExplanationThe problems only manifest when using aarch32 containers on aarch64 host. The problems disappear when using After reproducing the jtreg failures in an aarch32 container on an aarch64 host, the hs_err file complains about a SIGBUS address alignment issue stemming from the following JFR functions:
I was also able to reproduce the same problem using a simple "hello world" java app, when running with JFR. But there was no problem when running without -XX:StartFlightRecording (JFR disabled). So the problem was specific to JFR, but not isolated to the test cases. First, I wanted to figure out why JFR specifically was causing the SIBGUS. I suspected that it's because JFR does a lot of low level memory writing when committing events. Indeed it turns out that JFR uses bytes_aarch32.hpp when writing event data to its in-memory buffers. Code in bytes_aarch32.hpp deals with reading/writing directly to addresses — which would explain the SIGBUS address alignment errors. I suspect that the SIGBUS error doesn't point to code in this file directly due optimizations ( Out of curiosity, I create a slowdebug build (no optimizations) and found I could not reproduce any of the problems. This led me to believe that some optimizations were being done for aarch32 that did not work on the aarch64 host. I confirmed this theory by adding directives to disable optimizations So at this point it was determined that the SIGBUS alignment error was likely caused by optimizations in the bytes_aarch32.hpp code that deals with read/writing bytes directly, and that JFR is uniquely affected because JFR does a lot of low level reading/writing to event buffers whenever an event is committed. Next, I investigated why this problem only manifests in JDK8, but not in JDK 11 and 17 [references 1, 2 ]. The reason is because after JDK8, bytes_aarch32.hpp was converted to bytes_arm.hpp. The newer bytes_arm.hpp is much more careful about address alignment, and thus does not trigger SIGBUS. To confirm this, I swapped out the old bytes_aarch32.hpp for the new bytes_arm.hpp, and the problems disappeared. |
Thanks @roberttoyonaga! Currently all test agents of arm32 are the docker hosted and I believe it's aarch64 host @Haroon-Khel , is that correct ? https://ci.adoptium.net/label/ci.role.test&&sw.os.linux&&hw.arch.aarch32/ If that's the case I would suggest to disable those tests for jdk8 arm32, thoughts @ShelleyLambert ? |
FYI, my active GitHub account is @smlambert (other one tied to an old work email). Excluding the testcases because we do not have appropriate hardware to run them is fine, it should be done in the vendor problem list though, https://github.com/adoptium/aqa-tests/tree/master/openjdk/excludes/vendors/eclipse, so we do not affect others who may wish to run these tests and have appropriate hardware. |
Yes, the arm32 machines on which our nightly tests run are docker hosted, on arm64 hosts. The 2 odroid actual arm32 machines do not have the ci.role.test tag so our nightly tests do not run on them. |
Do you think it would be better just to update hotspot/src/cpu/aarch32/vm/bytes_aarch32.hpp? This file only exists in the Adoptium repo not in the OpenJDK repo. |
Upstream the file is under https://hg.openjdk.org/aarch32-port/jdk8u/hotspot/file/5ee36e3a5a61/src/cpu/aarch32/vm/bytes_aarch32.hpp. Adoptium mirrored upstream. Would it be possible or easy to update upstream? |
ohh I see. Ok in that case, maybe its better just to exclude the tests in the problem list files |
I've made a PR here to exclude the failing tests for eclipse only. #5469 |
#5469 (comment) has been merged. |
Many failed tests in jdk_jfr_2 in the extended openjdk suite on arm32
The list of failed tests are in https://ci.adoptopenjdk.net/job/Test_openjdk8_hs_extended.openjdk_arm_linux_testList_2/16/testReport/
There are 335 failed tests in total so wont be posting all of them here
All of the test failures have a similar error log
The text was updated successfully, but these errors were encountered: