Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System unavailable: docker-osuosl-ubuntu2004-ppc64le-1 can timeout running a docker command #3705

Closed
adamfarley opened this issue Aug 12, 2024 · 5 comments

Comments

@adamfarley
Copy link
Contributor

adamfarley commented Aug 12, 2024

  • Please put the system name in the title of this issue.

  • Link to any log file showing the problem:

Timeout: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-linux-ppc64le-temurin/353/console
First instance of command taking >15 seconds: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-linux-ppc64le-temurin/350/
Last instance of command taking <15 seconds: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-linux-ppc64le-temurin/349/

  • Please describe the issue:
21:02:28  docker-osuosl-ubuntu2004-ppc64le-1 does not seem to be running inside a container
21:02:30  $ docker run -t -d -u 1000:1000 -e BUILDIMAGESHA=adoptopenjdk/centos7_build_image@sha256:d7158d02fc35bdbc9654a7e2dcfc649ccb3e347b570eab98562808c6ed939783
21:02:30   --init -w /home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-linux-ppc64le-temurin -v /home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-linux-ppc64le-temurin:/home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-linux-ppc64le-temurin:rw,z -v /home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-linux-ppc64le-temurin@tmp:/home/jenkins/workspace/build-scripts/jobs/jdk11u/jdk11u-linux-ppc64le-temurin@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** adoptopenjdk/centos7_build_image cat
21:05:30  ERROR: Timeout after 180 seconds

Looks like the issue of the command taking 2+ minutes to run is consistent-ish on docker-osuosl-ubuntu2004-ppc64le-1, and started happening on the 26th of June (with the previous run on the 7th taking 14 seconds).

@sxa
Copy link
Member

sxa commented Aug 13, 2024

We have had some disk slowness issues in the past on some of the OSUOSL ppc machines, although I don't think we've hit anything for a while so interesting to see this coming back.

@steelhead31 steelhead31 self-assigned this Aug 14, 2024
@steelhead31
Copy link
Contributor

We've had a couple of issues recently flagged by nagios, that align with these failures, ( both times docker filled the disk to 100% capacity ) , and docker was using around 80Gb of disk space for its cache. I'll do some further work, and also increase the thresholds for nagios alerting on this server, so there is more warning to hopefully avoid this issue recurring.

@steelhead31
Copy link
Contributor

A build has completed ok on the osuosl host, will run a few more, it does appear to be slower than the skytap host though.

@steelhead31
Copy link
Contributor

2nd build has completed ok, with similar results so closing, please re-open if this recurrs, and I'll look at it as it happens... I suspect it was probably due to the docker filesystem being full. However Im unable to extend this, as we are at capacity at osuosl.

@adamfarley
Copy link
Contributor Author

Thanks Scott. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants