Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github actions for system testing failed with error "No space left on device" #78

Closed
ManavalanG opened this issue Jun 20, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@ManavalanG
Copy link
Member

ManavalanG commented Jun 20, 2023

Github actions for system testing started failing even though there has not been any significant code changes. It happened during step "Run QuaC system testing - WGS mode AND no prior QC data" due to following error in multiple(>5) snakemake-triggered jobs:

FATAL:   while extracting /home/runner/work/quac/quac/.snakemake/singularity/e0c80565ed6b26b379a971ee706979ce.simg: root filesystem extraction failed: extract command failed: WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
WARNING: Skipping mount /etc/hosts [binds]: /etc/hosts doesn't exist in container
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount proc [kernel]: /proc doesn't exist in container
WARNING: Skipping mount /opt/hostedtoolcache/singularity/3.8.3/x64/var/singularity/mnt/session/tmp [tmp]: /tmp doesn't exist in container
WARNING: Skipping mount /opt/hostedtoolcache/singularity/3.8.3/x64/var/singularity/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
WARNING: Skipping mount /opt/hostedtoolcache/singularity/3.8.3/x64/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container

Write on output file failed because No space left on device

FATAL ERROR:writer: failed to write file /image/root/usr/local/bin/x86_64-conda_cos6-linux-gnu-objcopy
Parallel unsquashfs: Using 2 processors
17464 inodes (25034 blocks) to write

: exit status 1

My suspicion was that something happened at the end of github runners, and so I reran multiple times over several days. However, they kept failing due to same error albeit at different snakemake-triggered jobs. Next, I reran the workflow for a commit that was successful in the past, but it failed this time around. This added strength to the notion that runners were the cause for these failures.

As a next step after discussion with James, storage at multiple stages of the github actions workflow was printed out - e754213

In the beginning of workflow, root dir had 22G available, and it had 5.6G available when the workflow errored out. Note that Github runners are said to have 14GB of storage avaialble, and storage consumed here was ~16G. Suspicion at this stage was that we are using more storage than we are supposed to.

We thought about Github large runners but they would cost us:

* For larger runners, there is no additional cost for configurations that assign public static IP addresses to a larger runner. For more information on larger runners, see "Using larger runners."
* Entitlement minutes cannot be used for larger runners.
* The larger runners are not free for public repositories.

Source

We then decided to free up storage space after seeing this thread - 126f5be. The workflow is still running, but it is already past the step that used to error out. Overall, it freed 29G from the root folder. Note that, the step "Run QuaC system testing - WGS mode AND no prior QC data" of the workflow consumed ~7G.

@ManavalanG ManavalanG added the bug Something isn't working label Jun 20, 2023
@ManavalanG
Copy link
Member Author

The workflow with fix to free up storage space before running the system testing succeeded. Note that, ubuntu runner had 61G used space in the beginning, and it ended with 45G used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant