-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load sometimes doesn't load #321
Comments
This is happening to us too. It's super weird because we have three identical workflows set up (with different image names) - two of them succeed but one of them is constantly failing with the above error. The workflow file:name: Docker
on:
push:
# Publish `staging` as Docker `latest` image.
branches:
- staging
# Publish `v1.2.3` tags as releases.
tags:
- v*
env:
IMAGE_NAME: ml-intents
jobs:
# Push image to GitHub Packages.
# See also https://docs.docker.com/docker-hub/builds/
push:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
# This is the a separate action that sets up buildx runner
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
# So now we can use GitHub actions' own caching for Docker layers!
- name: Cache Docker layers
uses: actions/cache@v2
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ env.IMAGE_NAME }}-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-${{ env.IMAGE_NAME }}-
- name: Build image
uses: docker/build-push-action@v2
with:
builder: ${{ steps.buildx.outputs.name }}
context: .
file: intents/Dockerfile
load: true
tags: ${{ env.IMAGE_NAME }}:latest
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache-new
- name: Login to GitHub Container Registry
uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Push image to GitHub Container Registry
run: |
IMAGE_ID=ghcr.io/${{ github.repository_owner }}/$IMAGE_NAME
# Change all uppercase to lowercase
IMAGE_ID=$(echo $IMAGE_ID | tr '[A-Z]' '[a-z]')
# Strip git ref prefix from version
VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')
# Strip "v" prefix from tag name
[[ "${{ github.ref }}" == "refs/tags/"* ]] && VERSION=$(echo $VERSION | sed -e 's/^v//')
# Use Docker `latest` tag convention
[ "$VERSION" == "staging" ] && VERSION=latest
echo IMAGE_ID=$IMAGE_ID
echo VERSION=$VERSION
echo Listing docker images...
docker image ls
echo Tagging image...
docker tag $IMAGE_NAME:latest $IMAGE_ID:$VERSION
echo Tagged image successfully!
echo Pushing image...
docker push $IMAGE_ID:$VERSION
echo Pushed image successfully!
- # Temp fix
# https://github.com/docker/build-push-action/issues/252
# https://github.com/moby/buildkit/issues/1896
name: Move cache
run: |
rm -rf /tmp/.buildx-cache
mv /tmp/.buildx-cache-new /tmp/.buildx-cache The runner gets to the The two successful workflows have much smaller images (500Mb and 2Gb) whereas the failing image is a lot bigger (5Gb). Could that be an influencing factor here? |
@champo @benhjames Cannot repro locally or with GHA. Maybe it fails silently because of insufficient disk space:
You have at your disposal 14GB (actually I would say 9GB by removing the pre-installed middleware) on the runner:
Can you add this step at the end of your workflow (before - name: Disk
if: always()
run: |
df -h
docker buildx du |
Thanks for investigating @crazy-max! I first added that step and a separate step to list the Docker images, but it still didn't appear to be exported into Docker. The disk space on that run seemed to match yours:
I then modified the workflow file to exactly match yours, and the same issue occured. Then I re-ran the same job, but this time it exported correctly. This was the first run where Docker had cache available (because previous builds before the last one never got a chance to save as it errored upon push to GCR). I then went back to look at your first run (i.e. without build cache) and noticed that in that particular run it doesn't list the Docker images. So I have a feeling that if there is no build cache, then the export to Docker fails, but if there is build cache, like in your subsequent builds and my last build linked above, then it succeeds. Really weird. Hope that helps...? |
@benhjames Thanks for your feedback. Yes actually
Can you add |
Thanks @crazy-max, I added that command to both Is there anything that you think could be done about this to shrink the disk usage after the build step? I notice that Sorry for the questions - would be great to find a solution to this somehow (without reverting back to the plain |
@benhjames
These are the subsequent instructions cached by buildx for the current builder. You can get more info by using
You could use a self-hosted runner but in the near future you will be able to configure CPU cores, RAM, disk space for the runner (see github/roadmap#161). Or more drastic, remove some components pre-installed on the runner in your workflow like dotnet (~23GB): - name: Remove dotnet
run: sudo rm -rf /usr/share/dotnet |
Thanks a lot @crazy-max for the help, that's really useful, much appreciated. 🙌 |
Hi, Thank you for this thread! I was running into the same issue. I would expect an error log of some kind when disk issues happen and the images cannot correctly Thanks! |
Hey @cep21, the issue to track in buildx is docker/buildx#593! |
❤️ Thanks for the deep look into this! I ended up changing the build approach for other reasons which I guess accidentaly reduced the image size, making the issue disappear. |
Hey folks! I believe I'm also hitting this issue – is there currently any workaround other than trying to shrink your image size? I tried |
@master-bob As discussed in #841, I made some tests using the FROM alpine
RUN dd if=/dev/zero of=/tmp/output.dat bs=2048M count=1
RUN dd if=/dev/zero of=/tmp/output2.dat bs=2048M count=1
RUN dd if=/dev/zero of=/tmp/output3.dat bs=2048M count=1
RUN uname -a jobs:
build:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
driver:
- docker
- docker-container
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
driver: ${{ matrix.driver }}
buildkitd-flags: --debug
-
name: Disk
run: |
df -h
-
name: Build and push
uses: docker/build-push-action@master
with:
context: .
file: ./fat.Dockerfile
load: true
tags: |
foo
-
name: List images
run: |
docker image ls
-
name: Disk
if: always()
run: |
df -h
docker buildx du
|
Thank you for the in-depth analysis.
Edit: I think the dotnet location changed on ubuntu-22 as I didn't see any significant change in space usage when attempting to remove. So I opted to remove Abreviated listing of
Before removing
and after
|
Change to build the image using docker action, this should then allow the docker action's cache to be used. Subsequently reducing build time ~50%, as the image will only need to be built once. Currently image is built twice. The default driver uses double the disk space, see docker/build-push-action/issues/321 (in brief the image is build in the build-push-action local cache and then transfered to the local docker). This is a problem as this image is so large. Using the `docker` driver will workaround this.
Change to build the image using docker action. Subsequently reducing build time ~50%, as the image will only need to be built once. Currently image is built twice. The default driver uses double the disk space, see docker/build-push-action#321 (in brief, the image is built in the build-push-action local cache, tared, and then transfered to the local docker). This is a problem as this image is so large. Using the docker driver will workaround this.
Just wanted to drop a note that I began experiencing this exact same issue today. In my workflow I build 3 separate docker image(s) with all using the
Today randomly one of the images was successfully being built but adding a step to inspect In the end I noticed this from @crazy-max up above:
Using setup-buildx-action@v2 with |
Fixes docker#892 Related to docker#321 Signed-off-by: Nicolas Vuillamy <[email protected]>
Additionally updates Pangeo-notebook, adds mamba, and removes tensorflow as it was likely responsible for exploding the build as the Docker image would not actually load Closes oceanhackweek#71 oceanhackweek#72 Xref docker/build-push-action#321
So there is enough space for the container. See docker/build-push-action#321
Behaviour
Trying to run a command with a just built image sometimes fails to find the image:
The build step runs ok and has no notable differences in output between correct and failed runs.
Expected behaviour
The
muun_android
image to be found and run. In https://github.com/muun/apollo/runs/2203961523?check_suite_focus=true it succeded (see the Inspect step cause the build failed due to something unrelated)Configuration
Logs
logs_8.zip
The text was updated successfully, but these errors were encountered: