Proposal: expose DiffOp in Dockerfile #4239

tonistiigi · 2023-09-16T00:17:31Z

This proposal was discussed in a previous maintainers meeting (@neersighted @cpuguy83). I think it has been discussed before as well, but maybe not in Github. If anyone finds references, then please add links.

Expose DiffOp functionality added in BuildKit v0.10 https://github.com/moby/buildkit/releases/tag/v0.10.0 via Dockerfile frontend. You can learn more about DiffOp from https://github.com/moby/buildkit/blob/v0.12.2/docs/dev/merge-diff.md .

This is to handle the following cases:

People who currently repeatedly copy over the same files in new Dockerfile commands can now access only the files they need without duplicate files in images.
Access files created by a specific command if they are not contained within a directory.
Make it possible to squash a stage with multi-stage builds without squashing the base image files.
Rebase top layers from one image on top of another image.

Proposal:

Add new syntax stageref1..stageref2 (two dots between stage names) that can be used in all the places where stage names can be referenced atm. This is FROM <stage> AS and COPY --from=<stage>, RUN --mount=from=<stage>.

Such instances would internally resolve both sides and then run llb.DiffOp between them, resulting in the context that only contains files in stageref2 and not stageref1.

Any current stage reference is allowed to be used by either side of the Diff expression. This means it can be (in order of priority) a named build context, stage defined in Dockerfile, or Docker image.

How many layers such expression creates is undefined. It does not flatten the files. In the current llb.DiffOp implementation, if diff can be performed purely by subtracting layers, BuildKit will never pull the blobs or modify them.

If flattening is desired, it can be achieved with:

FROM scratch
COPY --from=stage1..stage2 / /

Using more than 2 diff sources at once, eg. stage1..stage2..stage3 is not allowed. But the following is allowed:

FROM aa..bb AS cc

FROM cc..dd AS ee

Examples:

Copying over a layer from another image:

FROM alpine AS compile
RUN ./generate-files

FROM busybox
COPY --from=alpine..compile /usr/local /usr/local

Alternatively

FROM alpine..compile AS gen-files

FROM ...
COPY --from=gen-files /usr/local /usr/local

Note that diff files can be also accessed with --target=gen-files

Squash over base image:

FROM alpine AS build
RUN
RUN
RUN

FROM alpine AS squashed
COPY --from=alpine..build / /

FROM build

Rebase from old base to new base:

ARG IMG=myrepo/myimage
ARG OLDBASE=alpine:3.16
ARG NEWBASE=alpine:3.17

FROM ${OLDBASE}..${IMG} AS app

FROM ${NEWBASE}
COPY --link --from=app / /

Fallbacks:

Due to bugs in Moby implementation of DiffOp, it was disabled moby/moby#45112 unless containerd implementation is enabled. Ideally, these issues could be fixed.

The capabilities detection would detect missing DiffOp (either Moby graphdrivers or BuildKit <0.10) and give the user an error about missing features.

This on its own is not ideal as we want to provide an experience where defining #syntax with an updated version is enough to guarantee that Dockerfile builds on all configurations of BuildKit. In order to do that, frontend should implement diff semantics also on its own that it can use as a fallback when no native DiffOp support exists. Internally, this would resolve both sides of diff expression and run a container with both sides mounted. This container will then run a comparison of files in both directories and write the result to the third directory which becomes the result of the diff expression. This will obviously not have the same caching and layer semantics that native DiffOp has, but should result in creating the same collection of files.

As usual, this should be first tested in the labs channel and only promoted after (a successful) testing period.

The text was updated successfully, but these errors were encountered:

yyb196 · 2023-09-21T06:58:05Z

Is this proposal only affect build process，or it also affect push and pull processes, does it depend on new layer media type?

neersighted · 2023-10-18T04:43:53Z

This would only be for build; we're not discussing a new layer format here, though you are astute in realizing that e.g. a "metacopy" in the layer format might also allow some improvements. Thankfully, both (runtime/build-level, and layer-level) separately.

tonistiigi added kind/enhancement area/dockerfile labels Sep 16, 2023

tonistiigi mentioned this issue Nov 27, 2023

Feature request: On Dockerfiles, allow filepaths to be excluded from ADD and COPY commands #4439

Open

cpuguy83 mentioned this issue Dec 19, 2023

partial layer/image export moby/moby#8039

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: expose DiffOp in Dockerfile #4239

Proposal: expose DiffOp in Dockerfile #4239

tonistiigi commented Sep 16, 2023

yyb196 commented Sep 21, 2023

neersighted commented Oct 18, 2023

Proposal: expose DiffOp in Dockerfile #4239

Proposal: expose DiffOp in Dockerfile #4239

Comments

tonistiigi commented Sep 16, 2023

Proposal:

Examples:

Copying over a layer from another image:

Squash over base image:

Rebase from old base to new base:

Fallbacks:

yyb196 commented Sep 21, 2023

neersighted commented Oct 18, 2023