Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to export cache - invalid incomplete links #1509

Closed
bgaillard opened this issue May 27, 2020 · 6 comments · Fixed by #1659
Closed

Failed to export cache - invalid incomplete links #1509

bgaillard opened this issue May 27, 2020 · 6 comments · Fixed by #1659
Labels
Milestone

Comments

@bgaillard
Copy link

bgaillard commented May 27, 2020

Hi, I opened this case following discussions in aws/containers-roadmap#505 (comment)

I'm currently trying to build a Multi-stage Dockerfile with 9 stages using the following command (secret information have been replaced with XXXXXXX)

        docker run \
            --rm \
            --privileged \
            -v /tmp/verdaccio_auth_token:/tmp/verdaccio_auth_token \
            -v $(realpath ./):/tmp/src \
            -v ${DOCKER_BUILD_DIR}:/tmp/dockerfile \
            -v $HOME/.docker:/root/.docker \
            --env DOCKER_BUILDKIT=1 \
            --entrypoint buildctl-daemonless.sh \
            moby/buildkit:master \
            build \
            --frontend dockerfile.v0 \
            --local context=/tmp/src \
            --local dockerfile=/tmp/dockerfile \
            --opt build-arg:BUILD_COMMIT_ID=551f997186da5c748bc16e6b4337691669416e4b \
            --opt build-arg:BUILD_COMMIT_DATE=2020-05-20T06:57:44Z \
            --opt build-arg:BUILD_DATE=2020-05-25T06:53:02Z \
            --opt build-arg:BUILD_ENV=${ENVIRONMENT} \
            --secret id=verdaccio_token,src=/tmp/verdaccio_auth_token \
            --output type=image,name=XXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/gdc/myimage:commit.id.XXXXXXX,push=true \
            --export-cache type=registry,ref=XXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/gdc/myimage:buildcache,mod=max,push=true \
            --import-cache type=registry,ref=XXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/gdc/myimage:buildcache

This command seems to build our Docker image correctly. Also when the --import-cache and --export-cache options are not used the command correctly pushes the image on our Amazon ECR Docker repository.

But, when the --import-cache and --export-cache options are used I encounter the following error.

...
#67 exporting cache
#67 sha256:2700d4ef94dee473593c5c614b55b2dedcca7893909811a8f2b48291a1f581e4
#67 preparing build cache for export
#67 preparing build cache for export 7.4s done
#67 ERROR: invalid incomplete links
------
 > importing cache manifest from XXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/myimage:buildcache:
------
------
 > exporting cache:
------
error: failed to solve: rpc error: code = Unknown desc = invalid incomplete links
Failed to build 'myimage' Docker image !

I initially though the error was due to the export cache type registry but I also tried it with the type inline and encounter the same error.

--export-cache type=local,dest=/tmp/website-buildcache,mod=max

My version of Docker in use is the following (insecure registry IPs have been hidden with X.X.X.X).

Client:
 Debug Mode: false
 Plugins:
  buildx: Build with BuildKit (Docker Inc., v0.4.1)

Server:
 Containers: 14
  Running: 1
  Paused: 0
  Stopped: 13
 Images: 20
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-31-generic
 Operating System: Ubuntu 20.04 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.19GiB
 Name: baptiste
 ID: UVGW:ZJRY:LGPE:BXPL:ADVI:IXUK:HQLR:VFLD:RVBC:XPL2:MKAM:ZGAW
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: gensdeconfiance
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  X.X.X.X:80
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

I searched the internet to find other users having the same problem but found absolutely nothing. The only reference to invalid incomplete links I found is in source code here in cache/remotecache/v1/utils.go#L151.

I'm not used to develop in Go and do not understand what the error I encounter means.

What are those "links" ? Why are they "incomplete" / "invalid" ? How can I identify what's the source of the error enabling debugging somewhere ?

@tonistiigi
Copy link
Member

Can you make a runnable reproducer for this issue please. The error message indicates that the cache chain is corrupt/invalid when moving (linking) from one key to the next.

@bgaillard
Copy link
Author

Hi @tonistiigi and thanks for your reactivity on this case.

Can you make a runnable reproducer for this issue please. The error message indicates that the cache chain is corrupt/invalid when moving (linking) from one key to the next.

Yes sure, I'll try to do it if my additional tests do no lead to something interesting (see my additional explanation below).

I finally isolate the problem, it comes from one COPY instruction inside our Dockerfile.

When I have this COPY I encounter the invalid incomplete links error.

COPY --from=build-php /var/www/gdc/vendor /var/www/gdc/vendor

When I change the copy by adding a --chown=root:root OR --chown=0:0 I do not have an error anymore.

COPY --from=build-php --chown=root:root /var/www/gdc/vendor /var/www/gdc/vendor
# OR
COPY --from=build-php --chown=0:0 /var/www/gdc/vendor /var/www/gdc/vendor

With or without --chown=root:root/--chown=0:0 when I deploy my image the container has the same rights on disk.

/var/www/gdc # ls -al | grep vendor
drwxr-xr-x   75 root     root          4096 May 27 13:52 vendor

Do you have any additional clue about this problem or do you still need a reproducible runner to to debug the issue in much more details (i'm worried it would be tedious to do because our Multi-stage build is fat and depends on private AWS ECR repositories 😕) ?

Thanks

@tonistiigi
Copy link
Member

I doubt this is related to --chown or at least can't think how it could be. I think it is more likely that you have another command somewhere that would get the same cache key and changing the flag just makes these commands different.

@bgaillard
Copy link
Author

Hi @tonistiigi, thanks for your response.

For those who have the same problem here is a small reproducible sample : https://github.com/bgaillard/moby-1509

Simply clone the repo and run build.sh to see the error.

I think it is more likely that you have another command somewhere that would get the same cache key and changing the flag just makes these commands different.

As you can see in the sample I have several COPY FROM which copy the same data but in different stages.

Do you see the error in this file ? Is the behavior normal / expected ? If yes perhaps it would be good to have a much more clear error than invalid incomplete links ?

@ryanclark
Copy link

I encountered this with a Dockerfile like so

FROM ryan/base:latest as frontend-dependencies

WORKDIR /app

COPY --from=ryan/cache:latest /app/static/node_modules /app/static/node_modules

COPY static/package.json static/yarn.lock static/.yarnrc static/.yarnclean static/

RUN cd static && yarn install --production --frozen-lockfile

FROM scratch as output

WORKDIR /app

COPY --from=frontend-dependencies /app/static/node_modules /app/static/node_modules

As this is rebuilding a cache image, using the previous image as the cache, the yarn install operation can end up with no changes, so it seems like it thinks that the

COPY --from=frontend-dependencies /app/static/node_modules /app/static/node_modules

Is the same as

COPY --from=cache:latest /app/static/node_modules /app/static/node_modules

Which is technically true, in terms of caching - both operations end up with the exact same files, even though they're different operations. The cache exports fine if there are changes to package.json, but I would've hoped this sort of thing would be possible even with no changes.

As a workaround, I changed the WORKDIR from /app to be something different in each stage, so BuildKit thinks they're different operations. This would also explain why the --chown operation fixed it for you above.

The working version of the Dockerfile I posted above -

FROM ryan/base:latest as frontend-dependencies

WORKDIR /frontend

COPY --from=ryan/cache:latest /cache/static/node_modules /frontend/static/node_modules

COPY static/package.json static/yarn.lock static/.yarnrc static/.yarnclean static/

RUN cd static && yarn install --production --frozen-lockfile

FROM scratch as output

WORKDIR /cache

COPY --from=frontend-dependencies /frontend/static/node_modules /cache/static/node_modules

This is build with buildx using moby/buildkit:master. Command is like so:

docker buildx build \
	--push \
	--cache-from type=local,src=/tmp/previous-cache \
	--cache-to type=local,dest=/tmp/cache \
	--tag ryan/cache:latest \
	--file Dockerfile \
	--target output \
	.

The cache is synced to/from S3 before and after the build, so it's using the cache as normal.

(I've shortened all the repo names here, but the syntax matches exactly what didn't work, and what the workaround is)

@vladaionescu
Copy link
Contributor

The fix is working great, BTW. Just got around to testing this. 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants