You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Github actions for system testing started failing even though there has not been any significant code changes. It happened during step "Run QuaC system testing - WGS mode AND no prior QC data" due to following error in multiple(>5) snakemake-triggered jobs:
FATAL: while extracting /home/runner/work/quac/quac/.snakemake/singularity/e0c80565ed6b26b379a971ee706979ce.simg: root filesystem extraction failed: extract command failed: WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
WARNING: Skipping mount /etc/hosts [binds]: /etc/hosts doesn't exist in container
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount proc [kernel]: /proc doesn't exist in container
WARNING: Skipping mount /opt/hostedtoolcache/singularity/3.8.3/x64/var/singularity/mnt/session/tmp [tmp]: /tmp doesn't exist in container
WARNING: Skipping mount /opt/hostedtoolcache/singularity/3.8.3/x64/var/singularity/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
WARNING: Skipping mount /opt/hostedtoolcache/singularity/3.8.3/x64/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
Write on output file failed because No space left on device
FATAL ERROR:writer: failed to write file /image/root/usr/local/bin/x86_64-conda_cos6-linux-gnu-objcopy
Parallel unsquashfs: Using 2 processors
17464 inodes (25034 blocks) to write
: exit status 1
My suspicion was that something happened at the end of github runners, and so I reran multiple times over several days. However, they kept failing due to same error albeit at different snakemake-triggered jobs. Next, I reran the workflow for a commit that was successful in the past, but it failed this time around. This added strength to the notion that runners were the cause for these failures.
As a next step after discussion with James, storage at multiple stages of the github actions workflow was printed out - e754213
In the beginning of workflow, root dir had 22G available, and it had 5.6G available when the workflow errored out. Note that Github runners are said to have 14GB of storage avaialble, and storage consumed here was ~16G. Suspicion at this stage was that we are using more storage than we are supposed to.
We thought about Github large runners but they would cost us:
* For larger runners, there is no additional cost for configurations that assign public static IP addresses to a larger runner. For more information on larger runners, see "Using larger runners."
* Entitlement minutes cannot be used for larger runners.
* The larger runners are not free for public repositories.
We then decided to free up storage space after seeing this thread - 126f5be. The workflow is still running, but it is already past the step that used to error out. Overall, it freed 29G from the root folder. Note that, the step "Run QuaC system testing - WGS mode AND no prior QC data" of the workflow consumed ~7G.
The text was updated successfully, but these errors were encountered:
The workflow with fix to free up storage space before running the system testing succeeded. Note that, ubuntu runner had 61G used space in the beginning, and it ended with 45G used.
Github actions for system testing started failing even though there has not been any significant code changes. It happened during step "Run QuaC system testing - WGS mode AND no prior QC data" due to following error in multiple(>5) snakemake-triggered jobs:
My suspicion was that something happened at the end of github runners, and so I reran multiple times over several days. However, they kept failing due to same error albeit at different snakemake-triggered jobs. Next, I reran the workflow for a commit that was successful in the past, but it failed this time around. This added strength to the notion that runners were the cause for these failures.
As a next step after discussion with James, storage at multiple stages of the github actions workflow was printed out - e754213
In the beginning of workflow, root dir had 22G available, and it had 5.6G available when the workflow errored out. Note that Github runners are said to have 14GB of storage avaialble, and storage consumed here was ~16G. Suspicion at this stage was that we are using more storage than we are supposed to.
We thought about Github large runners but they would cost us:
Source
We then decided to free up storage space after seeing this thread - 126f5be. The workflow is still running, but it is already past the step that used to error out. Overall, it freed 29G from the root folder. Note that, the step "Run QuaC system testing - WGS mode AND no prior QC data" of the workflow consumed ~7G.
The text was updated successfully, but these errors were encountered: