-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IncludePatterns and ExcludePatterns options for Copy #2082
Conversation
What about cache? Looks to me that atm cache would still be invalidated even if files that match the pattern didn't change. |
Good point. I see two issues with cache right now.
|
Cache checksum is covered by the result of this func https://github.com/moby/buildkit/blob/master/solver/llbsolver/ops/file.go#L150 . I don't think the current |
Also, add a capability for this https://github.com/moby/buildkit/blob/master/solver/pb/caps.go and check in the client https://github.com/moby/buildkit/blob/master/client/llb/fileop.go#L660 |
Here's my rough understanding so far of what needs to be done. I wanted to run it by you before I start doing the work:
|
dedupeSelectors can also detect if one path uses filter and another doesn't then filtered one does not matter
I think buildkit/cache/contenthash/checksum.go Lines 464 to 472 in 8e71ac4
|
Yes, they can be used together. I think modifying
if there are any exclude filters.
What are the conditions where Are you saying it's reasonable to always scan the FS when there are include or exclude patterns? That would potentially be simpler, because it could reuse |
f293670
to
5ea2a92
Compare
@tonistiigi: I've turned |
When files are first checked from a snapshot that was not created by transferring local source. After that, specific path is scanned but other paths around it on same snapshot may be not.
Yes CI does not seem to be running because of vendoring issues. Did you cover the case where parent paths of the included patters need to contain the checksum for the metadata of the dir itself but not its contents? Is pattern digest the checksum of the top path that matched pattern or all files (eg subdirs inside it) matched individually and merged later with some logic? |
Are you talking about the situation where we copy If so, I don't think I handled this. It should be an easy fix to include that parent directory metadata in the cache key.
I think it's closer to the latter. Take a look at the logic in |
5ea2a92
to
cf80609
Compare
Looks like a missing |
8947fc4
to
bb9c8eb
Compare
Added this (see latest commit) |
e118757
to
57ddd68
Compare
client/client_test.go
Outdated
|
||
for _, name := range []string{"myfile", "sub/bar"} { | ||
_, err = os.Stat(filepath.Join(destDir, name)) | ||
require.Equal(t, true, errors.Is(err, os.ErrNotExist)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a native function for this:
https://github.com/stretchr/testify/blob/6990a05d54c2287be8787a411d77815643347339/require/require.go#L313-L315
@tonistiigi: No rush, but this is ready for another pass. |
Could this be used to implement moby/moby#37333 / moby/moby#15771 ? |
Yes, I believe it could. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that with this I can do copy with excluding all go files with **/*.go
exclude pattern but there is no way I can do copy only all go files?
cache/contenthash/checksum.go
Outdated
|
||
txn := cc.tree.Txn() | ||
root = txn.Root() | ||
var updated bool | ||
|
||
lastIncludedDir := "" | ||
iter := root.Seek([]byte{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!wildcard
case should seek to p
?
cache/contenthash/checksum.go
Outdated
} | ||
} | ||
if !matched { | ||
return false, partial, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false, false, nil
cache/contenthash/checksum.go
Outdated
dirSlash := fn + "/" | ||
for _, pat := range excludePatternMatcher.Patterns() { | ||
patStr := pat.String() + "/" | ||
if strings.HasPrefix(patStr, dirSlash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this pattern regexp. How can prefix be checked. I think at least missing Exclusion
check but can't find docs for that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As curently implemented this is best-effort, and if we have a pattern here rather than a prefix we might end up including some paths in the checksum that would be excluded by the filters. This would never cause false cache hits but might lead to false negatives.
The alternative would be to never use a dir's content hash when there is at least one exclude pattern, and instead walk all included dirs and only incorporate non-excluded files in the final checksum. Basically this code would be replaced with:
if cr.Type == CacheRecordTypeDir {
lastIncludedDir = fn
if excludePatternMatcher != nil {
// don't use the optimization where we add a whole dir's contents
// to the checksum when there is an exclude pattern, because some
// files in the dir may be excluded
continue treeWalk
}
}
for { | ||
k, _, ok := iter.Next() | ||
if !ok { | ||
break | ||
} | ||
fn := string(convertKeyToPath(k)) | ||
dirHeader := false | ||
if len(k) > 0 && k[len(k)-1] == byte(0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand where the non-null-byte dir is excluded from checksum for the partial case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The continue
a few lines below does this. It causes us to skip the call to checksum
when partialMatch
is true, but the path does not end in a null byte.
cc, err := newCacheContext(ref.Metadata(), nil) | ||
require.NoError(t, err) | ||
|
||
dgst, err := cc.Checksum(context.TODO(), ref, "foo", ChecksumOpts{IncludePatterns: []string{"foo"}}, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a valid case? I thought only dirs can have subpatterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think the behavior should be in this case? Should we exclude /foo
because the specified path is not a dir? Return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as copying a directory where IncludePatterns
are set but none match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, you think it should copy no files in this scenario?
I don't think that matches the behavior implemented in fsutil
. (*copier).copy
only checks IncludePatterns
on recursive calls, so calling Copy
with /foo
would copy /foo
regardless of IncludePatterns
. We would need a PR to fsutils to keep the behavior in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So dir with no matching subdirs will make an empty dir? Thanks seems ok.
Not sure about the file case. What do you think is the best? It is always wrong to use it like this and a sign that the user did something wrong. Do you see cases where it might seem wrong for the user if they get an error for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems reasonable to me to make it an error to try and do a Copy
op with IncludePatterns
or ExcludePatterns
when the source is a file. If we do that, we don't need any special handling at the checksum level, since it wouldn't be a valid scenario.
I could add a Stat
in docopy
if you think that makes sense. Or we could add this check at the fsutil
level.
require.Equal(t, dgstFileData0, dgst) | ||
|
||
dgstFoo, err := cc.Checksum(context.TODO(), ref, "", ChecksumOpts{IncludePatterns: []string{"foo"}}, nil) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no dgstFileData0
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will match dgstFileData0
because the record for /
is included in the checksum (in case other files are added in the future which match the pattern).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If source ends with /
(same for /
) but then the dir checksum should not apply, correct? As the dir itself is not copied.
|
||
dgstD0Star2, err := cc.Checksum(context.TODO(), ref, "", ChecksumOpts{IncludePatterns: []string{"d0/*"}}, nil) | ||
require.NoError(t, err) | ||
require.NotEqual(t, dgstD0Star, dgstD0Star2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do/abc
should be equal to dgstD0
not sure if this covers the non-nil-byte case described above or a case with intermediate dir is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we have IncludePatterns = []string{"d0"}
, we include the content hash of the dir d0
in the final checksum.
When we have IncludePatterns = []string{"d0/*"}
, we only include the metadata has for dir d0
in the final checksum (plus the files or dirs inside d0
which match the pattern).
So I believe these currently end up different, but we could make them match by removing the optimization that includes the "non-null byte" digest in the final checksum when a dir matches an include pattern exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d0
vs d0/*
differences should be ok. But we should check the case where path and includepatterns are the same, but one adds an extra file under path that doesn't match includepatters. That should still always give the same checksum.
fstest.CreateFile("sub/foo", []byte("foo0"), 0600), | ||
fstest.CreateFile("sub/bar", []byte("bar0"), 0600), | ||
) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should add some checks for cache stability. Showing that adding files not matching copy does not invalidate the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, but how can this client test check whether the cache was invalidated or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done with Exec
that adds a file with random contents and then check that after a repeated run the extra file's contents either changes or does not change.
solver/pb/caps.go
Outdated
CapFileRmWildcard apicaps.CapID = "file.rm.wildcard" | ||
CapFileBase apicaps.CapID = "file.base" | ||
CapFileRmWildcard apicaps.CapID = "file.rm.wildcard" | ||
CapFileCopyIncludePatterns apicaps.CapID = "file.copy.includepatterns" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very important but could be one cap as added together.
I don't think it supports doublestar - for that we need tonistiigi/fsutil#79 |
ExcludePatterns does support |
I think you're right then. Long term, we should add |
@tonistiigi: Do you know why the vendor check is failing? Also, is there anything you think still needs to be addressed here? |
require.NoError(t, err) | ||
require.Equal(t, digest.FromBytes(append([]byte("d0"), []byte(dgstDirD0)...)), dgst) | ||
require.Equal(t, dgstDirD0FileByFile, dgst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checksum changed because Checksum
used to return the content hash for the dir that matched the pattern, but we now go file-by-file in case any files are matched by exclude patterns (or exclusions from an include pattern).
We could reintroduce the old behavior, conditional on not having any include/exclude patterns, if you think that makes sense.
5ccb4c7
to
1d4a783
Compare
Allow include and exclude patterns to be specified for the "copy" op, similarly to "local". Depends on tonistiigi/fsutil#101 Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
Consider IncludePatterns and ExcludePattern when calculating content hashes. Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
…rator Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
… ExcludePatterns Signed-off-by: Aaron Lehmann <[email protected]>
Signed-off-by: Aaron Lehmann <[email protected]>
1d4a783
to
6f5ea71
Compare
@tonistiigi @hinshun: Any feedback on the latest version of this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see logic converging for IncludePaths
and ExcludePaths
. This PR looks good to me. cc @tonistiigi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-up, would be nice if we could have a page of docs that describes the algorithm for checksums based on file content, with examples for the different cases (file, dir, wildcard, includepath).
@aaronlehmann I just found this MR, it is the exact functionality I needed. Thank you! Is this available in a stable version of docker? I'm not too familiar with how moby and docker-ce relate and release. |
What I added here is low-level buildkit functionality, that I'm using in a buildkit-based project. I don't think it's available through Docker, unfortunately. |
@aaronlehmann Just found this MR too. Thank you! I check the release note that claimed this pr is included, also the 1.3-labs came after the 0.9.0. I am curious whether whether this functionality already available in current buildkit, say, 1.3-labs? Since I failed to find document on how to copy with exclude in dockerfile. Thank you in advance! |
Allow include and exclude patterns to be specified for the "copy" op,
similarly to "local".
Depends on tonistiigi/fsutil#101
cc @hinshun