Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: On Dockerfiles, allow filepaths to be excluded from ADD and COPY commands #4439

Open
leandrosansilva opened this issue Nov 25, 2023 · 10 comments

Comments

@leandrosansilva
Copy link
Contributor

This feature will be useful for multi stage builds, where some files should be copied in some stages but not others.

I imagine it looking like it follows:

FROM myimage AS not_py
COPY --exclude=sources/*.py sources  /src

FROM myimage AS not_cpp_or_txt
COPY --exclude=sources/*.cpp --exclude=/sources/*.txt sources  /src

In such example, /sources has some files required by not_py but not by not_cpp_or_txt, and vice-versa.

As far as I am aware, such construct at the moment cannot be accomplished with .dockerignore, which seems to prevent files to be copied to the context, whereas the exclude proposed adds files to the context, but not to the commands that use it.

@tonistiigi
Copy link
Member

.dockerignore, which seems to prevent files to be copied to the context, whereas the exclude proposed adds files to the context, but not to the commands that use it.

What is the benefit of that? We should not spend time transferring files to the daemon where they would be excluded by the copy step later. This doesn't mean that copy side shouldn't filter at all (copies between stages, multiple copies with different filters) but it should happen both for local source transfer and for copy.

I'm not so sure about the proposed syntax. It looks like it can become quite verbose if used as a replacement for a dockerignore. Maybe some way to define a new stage based on another filter with an additional filter definition, where the filter has multiple rules. Not sure exactly but #4239 also goes for a way to define new stages. It is also possible that this case is different enough from the dockerignore case and there will not be too many rules.

Some other considerations:

  • Do we expect many copies to use same excludes? Then they should be defined together somehow.
  • Is it ok that there are excludes but not includes? --parents is using IncludePatterns internally I think.

@thaJeztah @AkihiroSuda @cpuguy83

@leandrosansilva
Copy link
Contributor Author

Thx @tonistiigi for reviewing this issue.

I can see that there were other proposals over time, in addition to the one you linked, such as:

And this is a very similar proposal (meaning the current issue is actually a duplicate)

I mentioned the context in the original text, but in retrospect it was probably a mistake, as I confess I don't understand in details how the context works. The main target for this feature is actually the build cache.

@crazy-max
Copy link
Member

I think a generic --filter flag would be fine so we can handle includes as well:

FROM myimage AS not_py
COPY --filter=!*.py sources /src

FROM myimage AS not_cpp_or_txt
# OR
COPY --filter=!*.cpp --filter=!*.txt sources /src

FROM myimage AS combine
# AND
COPY --filter=!*.cpp,!*.py sources /src

FROM myimage AS only_cpp
COPY --filter=*.cpp sources /src

FROM myimage AS all
# default behavior as if we don't set --filter flag
COPY --filter=** sources /src

For more control we can use a comma separated list of match object instead of a simple path glob (combine stage). If multiple filters are set, they are combined using the OR operation, while individual match rules within an object are combined using the AND operation. By incorporating ! negation, you can create intricate matching rules.

@thaJeztah
Copy link
Member

What is the benefit of that? We should not spend time transferring files to the daemon where they would be excluded by the copy step later. This doesn't mean that copy side shouldn't filter at all (copies between stages, multiple copies with different filters) but it should happen both for local source transfer and for copy.

and

Is it ok that there are excludes but not includes?

👉 This is just a first "quick" blurb, but I know there's been various ticket around this (or similar) proposals, and we may have to go through use-cases mentioned in those.

  • some use-cases evolved around multiple targets using different sets of files in the same context; having a single .dockerignore made that more complicated (or not possible)
  • use-cases where --mount is used to mount files, but COPY statement to not include those files (or directories).
  • use-cases where the context contains a large number of files and directories from which the majority of files have to be included, except from some files (these may be generated files, or other artifacts). Describing files to include through COPY and wildcards is much more complicated than to describe what files to exclude. The --parents option alleviates some use-cases here, but not all of those.

I'm not so sure about the proposed syntax. It looks like it can become quite verbose if used as a replacement for a dockerignore.

I definitely think this can be a concern. Using a --flag for this works for basic use-cases, but can get complicated fast. I know there's been some proposals to have --ignorefile=xxx to specify a .dockerignore which could alleviate some of that, but also may be opening a new can of worms.

@LinusU
Copy link

LinusU commented Dec 6, 2023

  • use-cases where --mount is used to mount files, but COPY statement to not include those files (or directories).

This is my use case 🙋‍♂️

Specifically, I'm generating some "clean" versions of my package.json and package-lock.json that doesn't contain the version property, in order to get better caching.

My Dockerfile looks something like this:

FROM public.ecr.aws/lambda/nodejs:20.2023.11.21.13

RUN dnf install -y gcc gcc-c++ git make openssl-devel tar zip
WORKDIR /var/task

# This layer can be cached even between version bumps (or any other metadata changes that doesn't affect deps)
RUN --mount=type=bind,source=clean-package.json,target=package.json --mount=type=bind,source=clean-package-lock.json,target=package-lock.json npm ci

# These two layers would be condensed to one:
COPY . .
RUN rm clean-package.json clean-package-lock.json

RUN npm run-script build

For me, it would be great with a simple --exclude flag to COPY:

COPY --exclude=clean-package.json --exclude=clean-package-lock.json . .

This is not meant to replace .dockerignore, which we already use to ignore a lot of files.

@tonistiigi
Copy link
Member

We discussed this in one of the moby maintainers meetings.

Most people prefered the exclude over filter/include to cover the most basic cases. And possibly extend this later if reusing same filters becomes a problem. But exclude should be enough to test this in dockerfile labs channel.

Regarding implementation:

  • The exclude rules need to be integrated to context transfer as well, in addition to the filter during copy. If big files are excluded and never used during build then time should not be wasted to transfer them to the daemon. This filtering already works this way for the source paths and dockerignore paths.
  • @aaronlehmann is the current contenthash cache implementation (Add IncludePatterns and ExcludePatterns options for Copy #2082) already covering all cases for this, including ** etc. so that no excluded file impacts the cache checksum? If not then a workaround would be to do a internal copy to scratch stage first and then let next copy recompute the checksum. Hopefully this is not needed but needs to be verified with integration tests as well.

@aaronlehmann
Copy link
Collaborator

is the current contenthash cache implementation (#2082) already covering all cases for this, including ** etc. so that no excluded file impacts the cache checksum?

I believe so. That is the intent.

@leandrosansilva
Copy link
Contributor Author

Regarding implementation:

  • The exclude rules need to be integrated to context transfer as well, in addition to the filter during copy. If big files are excluded and never used during build then time should not be wasted to transfer them to the daemon. This filtering already works this way for the source paths and dockerignore paths.

If I understood properly, this mean that a file would need not be added to the context if it's ignored by .dockerignore AND from stages defined. Is it correct?

In this case, a file which is excluded (via --exclude) from all but one build stage must be added to the context.

I wonder if for this specific use case simply using .dockerignore would not be a better option and how much of an edge case it'd be.

I've created a change for the second suggestion (the cache) on #4561 and would be happy to get some feedback.

Thank you in advance.

@tonistiigi
Copy link
Member

AND from stages defined. Is it correct?

and from stages in use. Dockerfile frontend will already determine if a stage is reachable for a given build.

In this case, a file which is excluded (via --exclude) from all but one build stage must be added to the context.

Yes, there are cases where we can't be super precise about it. I think if there are multiple active copies, one has exclude and another one is using the excluded files then we should still transfer with one context and ignore the excludes (let the excludes happen via copy). But for simple cases where there is only one copy or all excludes are the same it should be possible to do the filter on transfer. I'm not sure how far we should go in determining if exclude from one COPY can collide with a path from another COPY. Just handling the most basic cases is probably fine initially. If common pattern appears later that is not handled by this case it can be improved later.

@leandrosansilva
Copy link
Contributor Author

I am continuing the discussion in the proposed PR #4561

leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Jan 24, 2024
It exposes the `ExcludePatterns` to the Dockerfile frontend, adding
`--exclude=<pattern>` option in the COPY and ADD commands, which will
cause filepaths matching such patterns not to be copied.

`--exclude` can be used multiple times.

References moby#4439

Signed-off-by: Leandro Santiago <[email protected]>
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Jan 24, 2024
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Jan 24, 2024
This affects the --exclude option in the COPY and ADD commands on
Dockerfiles.

References moby#4439

Signed-off-by: Leandro Santiago <[email protected]>
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Jan 29, 2024
It exposes the `ExcludePatterns` to the Dockerfile frontend, adding
`--exclude=<pattern>` option in the COPY and ADD commands, which will
cause filepaths matching such patterns not to be copied.

`--exclude` can be used multiple times.

References moby#4439

Signed-off-by: Leandro Santiago <[email protected]>
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Jan 29, 2024
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Jan 29, 2024
This affects the --exclude option in the COPY and ADD commands on
Dockerfiles.

References moby#4439

Signed-off-by: Leandro Santiago <[email protected]>
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Feb 8, 2024
It exposes the `ExcludePatterns` to the Dockerfile frontend, adding
`--exclude=<pattern>` option in the COPY and ADD commands, which will
cause filepaths matching such patterns not to be copied.

`--exclude` can be used multiple times.

References moby#4439

Signed-off-by: Leandro Santiago <[email protected]>
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Feb 8, 2024
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Feb 8, 2024
This affects the --exclude option in the COPY and ADD commands on
Dockerfiles.

References moby#4439

Signed-off-by: Leandro Santiago <[email protected]>
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Feb 19, 2024
It exposes the `ExcludePatterns` to the Dockerfile frontend, adding
`--exclude=<pattern>` option in the COPY and ADD commands, which will
cause filepaths matching such patterns not to be copied.

`--exclude` can be used multiple times.

References moby#4439

Signed-off-by: Leandro Santiago <[email protected]>
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Feb 19, 2024
leandrosansilva added a commit to leandrosansilva/buildkit that referenced this issue Feb 19, 2024
This affects the --exclude option in the COPY and ADD commands on
Dockerfiles.

References moby#4439

Signed-off-by: Leandro Santiago <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants