-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker BuildKit caching w/ --cache-from fails every second time, except when using docker-container
#2274
Comments
Two issues I'm noticing with using the docker-container driver to work around the caching issue:
With the default driver, rebuilds of code-only changes take ~1 minute (when I get proper caching of the expensive layers in my image). export/import steps
This seems to add an extra minute to the build. I'm working with large images (~3.5gb from various scientific Python libraries), which I'm guessing exacerbates this issue.
|
I opened #1981 and I can confirm that my reproducible example also still does not work |
Same problem when building from inside of docker:20.10.8-dind |
Same issue here. |
Same issue here on |
Any update on this? This seems like a major issue, and the alternative of using |
Can someone test with the master version of dockerd. 20.10 is a couple of buildkit releases old and it has been confirmed that it indeed works with buildkit directly. |
It turned out that using `DOCKER_BUILDKIT=1` has a problem with caching: moby/buildkit#2274. Using `docker buildx` would fix it, but it may not be installed on every machine. For now, turned buildkit only for boost image.
* Consolidate makefiles - Move docker building stuff to the main makefile - Drop internal makefiles - Allow to build lotus from source - Update readme * Fix caching a docker build of lotus-test It turned out that using `DOCKER_BUILDKIT=1` has a problem with caching: moby/buildkit#2274. Using `docker buildx` would fix it, but it may not be installed on every machine. For now, turned buildkit only for boost image.
Same issue here (Debian Bullseye)
|
We are facing same issue in Bitbucket pipelines. |
This ended up being enough of a drag on my team's productivity that we came up with a workaround that we've been using for about a month that has been working really well for us so for. We split out a "base" Docker image which installs all our dependencies, and then we have a "final" Docker image which just copies the code on top of the base image as a final layer. The important part is that these are distinct images and not just separate layers, which is how we work around the inconsistent layer caching behavior. Our "final" Dockerfile just looks like: FROM container-host.com/your-project/your-base-image:latest-version
COPY . /app Downside: This setup makes it harder to test changes to the base image. Instead of just updating a single Dockerfile and building+pushing, you need to (1) change the "base" Dockerfile/dependencies, (2) build and push the base image to your container host with a new tag for testing, (3) edit the "final" Dockerfile to reference the new testing tag. I wrote a Python script to do 2+3 so testing of changes to our base image is pretty streamlined still. Overall, this has definitely been worth it for us, especially since our base image is huge (3GB of Python ML dependencies) and takes a long time to build, so cache misses were extremely painful.
|
Based on my limited testing, using This was noted as a workaround in #1981, but may not always work, based on the comment above. We're using Bitbucket Pipelines (regular runner, not the self-hosted ones), which means no access to As a side note, |
Same issue with |
Our team is facing this same issue recently in github action since its latest runner image updated docker version to v23+ which uses BuildKit as default build engine. Our original cache flow is:
And with this flow we have the exact same issue that Tried pulling all image tags beforehand but it is not helpful. Based on my observations, it seems that
So our current workaround is to add a specific ci step
|
I can confirm that I have exactly same setup like @tomlau10 and it started to fail every other time in github actions |
I had a look at my team's deploy log and everything seems fine. Have you tried pushing I encountered this about two months ago. I first noticed it on 27/8, and upon investigation, I found that the Docker version in GitHub Action's runner image had been updated. Later on 25/9, I pushed I suspect that if the Docker version used to build the cache image does not match the one in use when building with
This aligns with my hypothesis:
Side note: |
Did anybody else observe that after they gave up and split their requirements installation into a separate build, that the new requirements build step always cached properly? |
I am working around this by using the legacy builder. It's deprecated, but still works as of Docker v25. # Enable legacy builder
DOCKER_BUILDKIT=0
docker pull $MY_IMAGE:latest || true
docker build --cache-from $MY_IMAGE:latest --tag $MY_IMAGE:latest .
docker push $MY_IMAGE:latest |
@tonistiigi Not to rush or anything, but what is the estimate on new patch version release? Thanks in advance |
Still having this issue:
Reliably fails to use cache after pushing a build that used the cache @tonistiigi |
This is a closed issue. If you have another issue open new one with full runnable reproduction steps and version infos. |
Thanks for the update. While troubleshooting and creating a minimal reproducible example (MRE), we made several changes that seemed to resolve the issue:
After applying these changes, caching appeared to work as expected. Hope this helps others experiencing similar problems! |
Similar to #1981, but it's still happening with 20.10.7, and I have a minimal reproduction case.
Version information
Steps to reproduce
Have this Dockerfile:
Run this script:
(also here: https://github.com/jli/docker-cache-issue-20210722 )
What I see: When I run the above script multiple times, it alternates every time whether the
RUN yes | head -20 | tee /yes.txt
step is cached or not. Thedocker build
output alternates between:=> [2/3] RUN yes | head -20 | tee /yes.txt
=> CACHED [2/3] RUN yes | head -20 | tee /yes.txt
With docker-container driver
This comment by @tonistiigi suggested to use the "container driver". This does seem to work! I tried replacing the
docker build
command from above with this:This consistently results in the
RUN yes ...
step being cached!The problem is that
docker buildx
doesn't appear to a subcommand in the https://hub.docker.com/_/docker image, which is what we use in CI. Is there a way to use the container driver when using that image?Could you help me understand why this is needed?
Will this be fixed with a future release?
The text was updated successfully, but these errors were encountered: