-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some s390x machines failing net tests with NoRouteToHostException #2807
Comments
Testing on the other instance on: |
Yes and I have some reruns on various other machines going now as part of triage efforts, and will update the issue once results are in. NoRouteToHostExceptions also seen on test-marist-sles15-s390x-2 - see https://ci.adoptopenjdk.net/job/Grinder/6086/ Those types of exceptions are not seen on test-marist-ubuntu2204-s390x-1, but other problems on that machine... issues appear to be mainly related to tests using multicast addresses. https://ci.adoptopenjdk.net/job/Grinder/6087/testReport/ |
I have a fix I can try on there relating to the firewall configuration - this only occurs on the new Marist machines we've got and will allow multicast to work based on past experience - applied on the ubuntu2204-s390x-1 machine referred to above and regrinding at https://ci.adoptopenjdk.net/job/Grinder/6103/ to test
[EDIT: This has resolved the problem - everything in java_net passed, although https://ci.adoptopenjdk.net/job/Grinder/6103/testReport/tools_jlink_JLinkReproducibleTest/java/JLinkReproducibleTest/ failed which is not likely to be related to this issue] |
I'm going to re-grind that one after removing a someone rogue entry in [EDIT: As expected no real change - [https://ci.adoptopenjdk.net/job/Grinder/6086/testReport/java_net_httpclient_http2_TLSConnection/java/TLSConnection/] passed in the new run, but that may have just been luck] |
seems still failing on java_net |
|
The above analysis suggests that we can resolve a lot of the issues on the RHEL/SLES systems by performing a similar firewall fix to assist the multicast packets to get through. It will be interesting to see how many other problems remain after doing that. Bear in mind that many of the offline machines are the older ones which were replaced during September as part of the Marist machine migration which we have done, so that is expected (They've been offline in jenkins for a while, but now need to be fully removed) |
Of note is that they do not appear as "offline", https://ci.adoptopenjdk.net/label/hw.arch.s390x&&ci.role.test/ shows where I would have expected to see the red X as with some other offline nodes: |
Re-runs on RHEL/SLES systems after adding the same iptables rule:
|
(Comment removed as it was supposed to be in #2820) |
Update: This issue (or something like it) is still seen. https://ci.adoptium.net/job/Test_openjdk11_hs_extended.openjdk_s390x_linux/140/ e.g. on https://ci.adoptium.net/computer/test-marist-sles12-s390x-2
After a number of NoRouteToHostExceptions in other targets, the jdk_jfr_1 target appears to cause the entire job to fail, and I'm guessing it's related to this issue. Have the other jobs associated with this issue failed as well? As in non-"unsafe" failed. Jenkins red job failed. |
Let's check if the outstanding problems are only on the SLES12 systems and whether they also occur in the docker SLES12 images that we have. |
March JDK22 release activities FYI @steelhead31 |
A little more narrowing down may be useful to verify the systems it's running on and then decide whether we need to continue supporting them, and potentially raise with Marist if it is across the board on all machines. |
It's been a while so time for a new table! [*] From an earlier comment our RHEL8 machine was not previously passing the tests (despite the iptables fix being put in place) but it is in the latest table above. Based on the above it's entirely possible that the subject message is only applicable to test-marist-sles15-s390x-2 now, and the others should be covered under separate issues ... Although based on adoptium/aqa-tests#5156 (comment) I'm going to try and run the whole of extended.openjdk on this machine to see if it hits any of these errors elsewhere in the compiler suite: https://ci.adoptium.net/job/Test_openjdk21_hs_extended.openjdk_s390x_linux/81/ Noting also that test-marist-rhel7-s390x-2 has been taken offline due to these exceptions so is not included in the above tests. Memo to self: iptables suggestion that worked on some machines previously is at #2807 (comment) |
The three particularly problematic machines have now been taken offline. The expectation is that the RHEL7 and SLES12 will be decomissioned, although we may wish to example sles15 further, and perhaps attempt a new SLES15 provision to see how that goes.
|
As described in adoptium/aqa-tests#4039 (comment)
To make it easy for the infrastructure team to repeat and diagnose, please
answer the following questions:
Test_
job on https://ci.adoptopenjdk.net which showed the failure: https://ci.adoptopenjdk.net/job/Grinder/5897/Any other details:
The text was updated successfully, but these errors were encountered: