-
-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception: java.nio.file.FileSystemException: No space left on device
on test-docker-alpine311-x64-1
#2320
Comments
Looks like this container is chewing up about 46gb in the test directory for |
Both |
70 tests failed with https://ci.adoptopenjdk.net/job/Test_openjdk17_hs_sanity.system_x86-64_alpine-linux/64/console |
@sophia-guo As mentioned in my previous comment:
There seems to be about 75Gb free just now so I'm guessing the test managed to clear up after itself but we need the excludes in place, or for someone to diagnose it, otherwise this is just going to recur. |
Agreed underlying cause should be understood. Though i'm not sure if it is testcases related. As you mentioned it is related with extended openjdk tests.
And now it's system tests sanity.system tests Looks like only happened to alpine docker with different test category. |
Got
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/1781/ |
It's going to continue to happen unless someone disables it. This needs to be resolved by identifying and disabling or fixing the problem causing the failure of the test, not by leaving it as an infrastructure issue where we clear it up until it recurs a few days later. |
I have added a Cleanup-Nodes job (related: #2369), in order to help diagnose the problem as we are unable to determine which tests are causing the issue (most failing tests have been disabled). I am unable to run the Cleanup-Nodes job on machines that are already in this state so I am requesting assistance to clean the test-docker* machines manually first and a label that identifies them differently from bare-metal machines so we can better observe differences in behaviour. This will also help address tests that do not do well in a containerized environment (related: adoptium/aqa-tests#2138). I would like to raise the priority of this issue, as it is impacting people's ability to work. More than half the Grinders that are launched fail due to being sent to static docker machines during the setup stage and fail to proceed due to "no space left on device" issues. |
Let's discuss this today. Alpine in only run in containers so a label to determine whether it's running in a container will not directly help this issue (it would be on all of them). It needs root cause analysis done on the Alpine core dumps. |
Core files in the Alpine 312 container hosted on
|
Those jobs have been identified as not working with headless adoptium/aqa-tests#2877. |
https://ci.adoptopenjdk.net/computer/test-docker-alpine312-x64-2/ - I've marked this machine offline with a link to this issue. |
Brought it back online (It will have affected anything on the physical host, so taking one machine offline wouldn't necessarily have resolved much (except hopefully sdcheduling the next one on a container on another host!) Have also done a bit of a clearup and now brought it back online but we're going to need a way to manage this space - possibly #2369 We'll need to keep an eye on this for the release next week (i.e. I'll try and remember to do some checking/pruning before initiating pipelines) |
On the purging note, and since I also mentioned to @Haroon-Khel about the Cleanup-Nodes job I added, I can add a schedule to run this on a regular basis. https://ci.adoptopenjdk.net/view/work-in-progress/job/Cleanup-Nodes/ (and related to: #2369). Currently I have only run it manually on a machine by machine basis, but it can also be set to run against all machines, or all machines with particular labels, etc. |
That job doesn't appear to have run successfully and is currently giving |
Looks to be the
It looks like the |
The core files are coming from the native
Running that binary results in:
|
In the absence of a method for disabling individual tests on Alpine being available at the moment, we should probably disable the weekend |
jdk_tools has been disabled on this platform a month ago. During today's pre-release triage adoptium/aqa-tests#3465, we see that all test jobs are failing with no space left on device. No visibility on what now is taking the space. |
Closing this as the machines have been given extra space as per #2510 (comment) - it can be re-opened later if required. |
Exception: java.nio.file.FileSystemException: No space left on device
on test-docker-alpine311-x64-1, which happened after ws-cleanup successfully completed.https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/328/console
The text was updated successfully, but these errors were encountered: