Docker BuildKit caching w/ --cache-from fails every second time, except when using `docker-container` #2274

jli · 2021-07-22T22:59:30Z

Similar to #1981, but it's still happening with 20.10.7, and I have a minimal reproduction case.

Version information

Macbook Air (M1, 2020)
Mac OS Big Sur 11.4
Docker Desktop 3.5.2 (66501)

% docker version
Client:
 Cloud integration: 1.0.17
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.16.4
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:23 2021
 OS/Arch:           darwin/arm64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:55:36 2021
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Steps to reproduce

Have this Dockerfile:

# syntax=docker/dockerfile:1
FROM debian:buster-slim
RUN yes | head -20 | tee /yes.txt
COPY . /app

Run this script:

#!/bin/bash
set -euo pipefail
DOCKER_BUILDKIT=1
docker system prune -a -f
docker build \
    -t circularly/docker-cache-issue-20210722:cachebug \
    --cache-from circularly/docker-cache-issue-20210722:cachebug \
    --build-arg BUILDKIT_INLINE_CACHE=1 \
    .
docker push circularly/docker-cache-issue-20210722:cachebug
# this causes a change in the local files to simulate a code-only change
date > date_log.txt

(also here: https://github.com/jli/docker-cache-issue-20210722 )

What I see: When I run the above script multiple times, it alternates every time whether the RUN yes | head -20 | tee /yes.txt step is cached or not. The docker build output alternates between:

=> [2/3] RUN yes | head -20 | tee /yes.txt
=> CACHED [2/3] RUN yes | head -20 | tee /yes.txt

With docker-container driver

This comment by @tonistiigi suggested to use the "container driver". This does seem to work! I tried replacing the docker build command from above with this:

docker buildx create --driver docker-container --name cache-bug-workaround
docker buildx build --builder cache-bug-workaround --load \
    -t circularly/docker-cache-issue-20210722:cachebug-containerdriver \
    --cache-from circularly/docker-cache-issue-20210722:cachebug-containerdriver \
    --build-arg BUILDKIT_INLINE_CACHE=1 \
    .
docker buildx rm --builder cache-bug-workaround

This consistently results in the RUN yes ... step being cached!

The problem is that docker buildx doesn't appear to a subcommand in the https://hub.docker.com/_/docker image, which is what we use in CI. Is there a way to use the container driver when using that image?

Could you help me understand why this is needed?
Will this be fixed with a future release?

The text was updated successfully, but these errors were encountered:

jli · 2021-07-23T02:11:21Z

Two issues I'm noticing with using the docker-container driver to work around the caching issue:

It adds some export/import steps
docker push seems to be pushing all layers?

With the default driver, rebuilds of code-only changes take ~1 minute (when I get proper caching of the expensive layers in my image).
With the docker-container driver, these 2 factors mean rebuilds after code-only changes take ~4-5 minutes.

export/import steps

#25 exporting to oci image format
#25 exporting layers done
#25 exporting manifest sha256:01230f6377dec5a6988c924373bb62afe2837d3afa7bb0e84e98a016481c1c81 done
#25 exporting config sha256:4f48d81bc559f074600e3088949591f885d4ef3c74b8d833408864b6bd013df4 done
#25 sending tarball
#25 ...

#26 importing to docker
#26 DONE 32.1s

#25 exporting to oci image format
#25 sending tarball 43.0s done
#25 DONE 43.0s

This seems to add an extra minute to the build. I'm working with large images (~3.5gb from various scientific Python libraries), which I'm guessing exacerbates this issue.

`docker push` issue

Pushing my 3.5gb image takes ~3 minutes.

It seems that with the docker-container driver, docker push isn't able to see that the expensive layers are shared, and it's pushing all the layers instead of only pushing the new layers? I'm guessing this based on the output from docker push not saying "Layer already exists":

6474dc186dfd: Preparing
2d80b2e557e9: Preparing
59149f33a870: Preparing
ed04f21afbe5: Preparing
c9ec67fe6421: Preparing
e42dc4266416: Preparing
a55e5a0e7c4a: Preparing
aef13dfbb6f9: Preparing
1e602bec2da5: Preparing
b1c4e3f331ea: Preparing
3fdf9f44ae06: Preparing
78ce42cd87aa: Preparing
82e21ae59256: Preparing
02c055ef67f5: Preparing
e42dc4266416: Waiting
3fdf9f44ae06: Waiting
a55e5a0e7c4a: Waiting
78ce42cd87aa: Waiting
aef13dfbb6f9: Waiting
1e602bec2da5: Waiting
82e21ae59256: Waiting
b1c4e3f331ea: Waiting
02c055ef67f5: Waiting
ed04f21afbe5: Pushed
59149f33a870: Pushed
c9ec67fe6421: Pushed
2d80b2e557e9: Pushed
aef13dfbb6f9: Pushed
1e602bec2da5: Pushed
b1c4e3f331ea: Pushed
3fdf9f44ae06: Pushed
6474dc186dfd: Pushed
82e21ae59256: Pushed
78ce42cd87aa: Pushed
02c055ef67f5: Pushed
e42dc4266416: Pushed
a55e5a0e7c4a: Pushed

I push several tags. The first push takes 3 minutes, and the rest of the tags finish quickly as they all say "Layer already exists" for all the layers.

Bi0max · 2021-08-02T15:55:23Z

I opened #1981 and I can confirm that my reproducible example also still does not work

shootkin · 2021-09-07T15:40:26Z

Same problem when building from inside of docker:20.10.8-dind

sherifabdlnaby · 2022-01-08T23:08:21Z

Same issue here.

thomasfrederikhoeck · 2022-01-13T08:53:33Z

Same issue here on 20.10.11+azure-3

marchaos · 2022-04-27T11:00:47Z

Any update on this? This seems like a major issue, and the alternative of using docker-container is untenable due to those issues noted above.

tonistiigi · 2022-04-27T18:27:31Z

Can someone test with the master version of dockerd. 20.10 is a couple of buildkit releases old and it has been confirmed that it indeed works with buildkit directly.

It turned out that using `DOCKER_BUILDKIT=1` has a problem with caching: moby/buildkit#2274. Using `docker buildx` would fix it, but it may not be installed on every machine. For now, turned buildkit only for boost image.

* Consolidate makefiles - Move docker building stuff to the main makefile - Drop internal makefiles - Allow to build lotus from source - Update readme * Fix caching a docker build of lotus-test It turned out that using `DOCKER_BUILDKIT=1` has a problem with caching: moby/buildkit#2274. Using `docker buildx` would fix it, but it may not be installed on every machine. For now, turned buildkit only for boost image.

Raniz85 · 2022-10-19T08:42:17Z

Same issue here (Debian Bullseye)

$ docker --version
Docker version 20.10.14, build a224086

adityapatadia · 2022-11-26T07:56:12Z

We are facing same issue in Bitbucket pipelines.

jli · 2022-11-26T17:55:32Z

This ended up being enough of a drag on my team's productivity that we came up with a workaround that we've been using for about a month that has been working really well for us so for.

We split out a "base" Docker image which installs all our dependencies, and then we have a "final" Docker image which just copies the code on top of the base image as a final layer.

The important part is that these are distinct images and not just separate layers, which is how we work around the inconsistent layer caching behavior.

Our "final" Dockerfile just looks like:

FROM container-host.com/your-project/your-base-image:latest-version
COPY . /app

Downside: This setup makes it harder to test changes to the base image. Instead of just updating a single Dockerfile and building+pushing, you need to (1) change the "base" Dockerfile/dependencies, (2) build and push the base image to your container host with a new tag for testing, (3) edit the "final" Dockerfile to reference the new testing tag. I wrote a Python script to do 2+3 so testing of changes to our base image is pretty streamlined still.
Note: It would be some more work to make this fully integrated with CI such that the base image used in prod is also built in CI. currently, we just use the base images built on local machines from when people make changes. This is acceptable to us, but maybe some people have more stringent requirements.

Overall, this has definitely been worth it for us, especially since our base image is huge (3GB of Python ML dependencies) and takes a long time to build, so cache misses were extremely painful.

docker build for code-only changes are guaranteed to only copy the code layer.
docker push for the new code-only layers is also guaranteed to be fast (when the cache would break for base layers before, people would have to upload 3GB of data, sometimes over spotty WiFi or while tethering)
everyone is guaranteed to share the expensive central base image. new team members or people who've pruned their cache just need to download the base image instead of building their own local copy (docker pull never worked for this, in my experience)
building our Docker image in CI is guaranteed to be fast, and also much simpler since we no longer have a bunch of verbose --cache-from flags and extra docker push calls to get caching in CI builds. (Though see note above about fully integrating this process in CI)

ShadowLNC · 2023-01-11T14:38:16Z

Based on my limited testing, using docker pull <version> for every image used in --cache-from arguments will suppress this bug.

This was noted as a workaround in #1981, but may not always work, based on the comment above. We're using Bitbucket Pipelines (regular runner, not the self-hosted ones), which means no access to buildx, limited Docker updates, and x86-only builds - any one of which might affect the viability of this workaround.

As a side note, docker pull <tag> || true can be used in pipeline steps where you're not sure if the image exists.

rucciva · 2023-07-27T15:24:07Z

Same issue with buildctl-daemonless.sh

tomlau10 · 2023-08-11T07:43:45Z

Our team is facing this same issue recently in github action since its latest runner image updated docker version to v23+ which uses BuildKit as default build engine.

Our original cache flow is:

pull same commit sha tag || pull latest tag
build with --cache-from <same commit sha> --cache-from <latest>
tag the new image as <commit sha> & latest
push both tags

And with this flow we have the exact same issue that --cache-from fails every second time.

Tried pulling all image tags beforehand but it is not helpful. Based on my observations, it seems that

If an image is built from scratch, then this image CAN be used as cache
Otherwise if an image is built using a cache image, then it CANNOT be used as cache
This matches the strange caching behaviour because we always push the new built image as latest then build from it the next time.

So our current workaround is to add a specific ci step

say if the trigger branch is deployment/build_cache
it will build image from SCRATCH and push with tag <image>:build_cache
all other trigger branch will build with --cache-from <image>:build_cache instead of latest tag
and if dockerfile is changed (eg. base image updated)
then we push deployment/build_cache once to update the cache
so far the caching behaviour is now more consistent

matti · 2023-10-24T14:40:42Z

I can confirm that I have exactly same setup like @tomlau10 and it started to fail every other time in github actions

tomlau10 · 2023-10-29T02:22:27Z

I had a look at my team's deploy log and everything seems fine. Have you tried pushing deployment/build_cache to refresh the cache image again? @matti

I encountered this about two months ago. I first noticed it on 27/8, and upon investigation, I found that the Docker version in GitHub Action's runner image had been updated. Later on 25/9, I pushed deployment/build_cache to refresh the cache, and everything began working as expected again.

I suspect that if the Docker version used to build the cache image does not match the one in use when building with --cache-from, then this issue can occur. According to the runner image changelog, the following updates have been made:

15/9 updated to 24.0.6
24/8 updated to 24.0.5
3/8 updated to 23.0.6+azure-2

This aligns with my hypothesis:

I set up the cache in mid-August when the Docker version was 23.0.6+azure-2.
The cache started failing on 27/8 because the Docker version had already been updated to 24.0.5 on 24/8.
When I finally had time to debug and push deployment/build_cache on 25/9, the Docker version was 24.0.6, which is the latest version provided by the GitHub Actions runner image. So everything works fine up to now.

Side note:
Docker version 24.0.7 is released recently on 27/10. I think my cache setup will start to fail again soon when the action runner image is updated, and I will have to push deployment/build_cache to refresh the cache again. 🙈
https://docs.docker.com/engine/release-notes/24.0/#2407

SDeans0 · 2024-02-20T13:26:28Z

Did anybody else observe that after they gave up and split their requirements installation into a separate build, that the new requirements build step always cached properly?

cvn · 2024-02-23T08:52:55Z

I am working around this by using the legacy builder. It's deprecated, but still works as of Docker v25.

# Enable legacy builder
DOCKER_BUILDKIT=0

docker pull $MY_IMAGE:latest || true
docker build --cache-from $MY_IMAGE:latest --tag $MY_IMAGE:latest .
docker push $MY_IMAGE:latest

DaniilAnichin · 2024-04-05T12:15:51Z

@tonistiigi Not to rush or anything, but what is the estimate on new patch version release?
Our ultimate goal is for the updated version to be available for use in bitbucket pipeline, and I'd be glad to know a rough approximation on when to expect that to happen)

Thanks in advance

alexwilson1 · 2025-01-29T09:37:33Z

Still having this issue:

  image: docker:27.5.1
  services:
    - docker:27.5.1-dind

  - docker pull ${IMAGE_PATH}:latest || true
  - DOCKER_BUILDKIT=1 docker build --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from ${IMAGE_PATH}:latest --tag ${IMAGE_PATH}:${CI_COMMIT_SHA} --tag ${IMAGE_PATH}:latest .
  - docker push ${IMAGE_PATH} --all-tags

NAME/NODE     DRIVER/ENDPOINT   STATUS    BUILDKIT   PLATFORMS
default*      docker                                 
 \_ default    \_ default       running   v0.18.2    linux/amd64 (+3), linux/arm64, linux/arm (+2), linux/ppc64le, (4 more)

Reliably fails to use cache after pushing a build that used the cache @tonistiigi

tonistiigi · 2025-01-29T19:07:15Z

This is a closed issue. If you have another issue open new one with full runnable reproduction steps and version infos.

alexwilson1 · 2025-01-29T22:43:56Z

Thanks for the update. While troubleshooting and creating a minimal reproducible example (MRE), we made several changes that seemed to resolve the issue:

Removed the explicit pull step:

docker pull ${IMAGE_PATH}:latest || true

Removed the explicit DOCKER_BUILDKIT=1 declaration.
Made caching work for multi-stage builds by setting mode=max in the --cache-to option.
- We also had to use the repository cache (type=registry) to make mode=max work.

After applying these changes, caching appeared to work as expected.

Hope this helps others experiencing similar problems!

simonkotwicz mentioned this issue May 6, 2022

Stop using buildkit simonkotwicz/deliver#4

Merged

This was referenced Jul 21, 2022

docker --cache-from with BUILDKIT_INLINE_CACHE does not work every second time #1981

Closed

Docker BuildKit caching w/ --cache-from fails (roundly 50% rate), even when using docker-container #2279

Open

HealthyPear mentioned this issue Jul 28, 2022

BuildKit --cache-from not working #2989

Open

AaronFriel mentioned this issue Mar 20, 2023

Builds with inline caching do not consistently use cache #3730

Open

AaronFriel mentioned this issue Apr 22, 2023

Improve docs and enable reliable multi-stage caching pulumi/pulumi-docker#601

Merged

tonistiigi added kind/bug area/moby-integration confirmed labels Jul 31, 2023

tonistiigi added this to the v0.13 milestone Jul 31, 2023

Volato999 mentioned this issue Oct 30, 2023

Two issues I'm noticing with using the docker-container driver to work around the caching issue: moby/docker-ci-zap#15

Closed

tonistiigi modified the milestones: v0.13.0, v0.14.0 Feb 29, 2024

tonistiigi mentioned this issue Mar 25, 2024

inline: fix uncompressed digest importing from multiple providers #4796

Merged

tonistiigi closed this as completed in #4796 Mar 26, 2024

dbkegley mentioned this issue Apr 3, 2024

Docker buildx bake experiment for builds rstudio/rstudio-docker-products#705

Closed

fruzitent mentioned this issue Apr 15, 2024

WCOW buildkit/buildx caching docker/buildx#2411

Closed

3 tasks

rianmcguire mentioned this issue Jun 17, 2024

Bump Docker buildx to v0.15.0 buildkite/elastic-ci-stack-for-aws#1329

Merged

tom93 mentioned this issue Aug 12, 2024

Inline cache not working with dockerd 27.1.1 #5242

Closed

jaimebarriga mentioned this issue Jan 27, 2025

[Package Request] - Docker version 26 and 27 amazonlinux/amazon-linux-2023#804

Open

jaimebarriga mentioned this issue Feb 4, 2025

Outdated buildkit version on newer cloudformation stacks buildkite/elastic-ci-stack-for-aws#1398

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker BuildKit caching w/ --cache-from fails every second time, except when using `docker-container` #2274

Docker BuildKit caching w/ --cache-from fails every second time, except when using `docker-container` #2274

jli commented Jul 22, 2021 •

edited

Loading

jli commented Jul 23, 2021 •

edited

Loading

Bi0max commented Aug 2, 2021

shootkin commented Sep 7, 2021

sherifabdlnaby commented Jan 8, 2022

thomasfrederikhoeck commented Jan 13, 2022

marchaos commented Apr 27, 2022

tonistiigi commented Apr 27, 2022

Raniz85 commented Oct 19, 2022 •

edited

Loading

adityapatadia commented Nov 26, 2022

jli commented Nov 26, 2022 •

edited

Loading

ShadowLNC commented Jan 11, 2023

rucciva commented Jul 27, 2023

tomlau10 commented Aug 11, 2023

matti commented Oct 24, 2023

tomlau10 commented Oct 29, 2023

SDeans0 commented Feb 20, 2024

cvn commented Feb 23, 2024

DaniilAnichin commented Apr 5, 2024

alexwilson1 commented Jan 29, 2025 •

edited

Loading

tonistiigi commented Jan 29, 2025 •

edited

Loading

alexwilson1 commented Jan 29, 2025 •

edited

Loading

Docker BuildKit caching w/ --cache-from fails every second time, except when using docker-container #2274

Docker BuildKit caching w/ --cache-from fails every second time, except when using docker-container #2274

Comments

jli commented Jul 22, 2021 • edited Loading

Version information

Steps to reproduce

With docker-container driver

jli commented Jul 23, 2021 • edited Loading

export/import steps

docker push issue

Bi0max commented Aug 2, 2021

shootkin commented Sep 7, 2021

sherifabdlnaby commented Jan 8, 2022

thomasfrederikhoeck commented Jan 13, 2022

marchaos commented Apr 27, 2022

tonistiigi commented Apr 27, 2022

Raniz85 commented Oct 19, 2022 • edited Loading

adityapatadia commented Nov 26, 2022

jli commented Nov 26, 2022 • edited Loading

ShadowLNC commented Jan 11, 2023

rucciva commented Jul 27, 2023

tomlau10 commented Aug 11, 2023

matti commented Oct 24, 2023

tomlau10 commented Oct 29, 2023

SDeans0 commented Feb 20, 2024

cvn commented Feb 23, 2024

DaniilAnichin commented Apr 5, 2024

alexwilson1 commented Jan 29, 2025 • edited Loading

tonistiigi commented Jan 29, 2025 • edited Loading

alexwilson1 commented Jan 29, 2025 • edited Loading

Docker BuildKit caching w/ --cache-from fails every second time, except when using `docker-container` #2274

Docker BuildKit caching w/ --cache-from fails every second time, except when using `docker-container` #2274

jli commented Jul 22, 2021 •

edited

Loading

jli commented Jul 23, 2021 •

edited

Loading

`docker push` issue

Raniz85 commented Oct 19, 2022 •

edited

Loading

jli commented Nov 26, 2022 •

edited

Loading

alexwilson1 commented Jan 29, 2025 •

edited

Loading

tonistiigi commented Jan 29, 2025 •

edited

Loading

alexwilson1 commented Jan 29, 2025 •

edited

Loading