-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blobinfocache: cache compression types #1089
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
(Not at all a review, just a minimal first skim. Making most of this ~unavoidably visible through the public API is the main worry.)
af75978
to
2f89fba
Compare
fd2134a
to
2767594
Compare
Dropped the "MediaType is always correct" goal because handling switching to zstd compression while updating a schema2 manifest wasn't looking doable without restructuring how the |
1f26e4b
to
54b1d26
Compare
The "docker" transport's |
54b1d26
to
89621de
Compare
89621de
to
9d239cb
Compare
Rebased. |
docker/docker_image_dest.go
Outdated
var compressorName string | ||
if info.CompressionAlgorithm != nil { | ||
compressorName = info.CompressionAlgorithm.Name() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely sure we end up recording the right value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the TryReusingBlob()
case where the registry indicates that it already has the blob so that we record the compression type as UnknownCompression. The MIME type we're passed from copy
comes from the source image's manifest, so it should already be correct for the blob with the same digest without needing any fixup from us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far basically just looking at the API, I’ll continue reading the actual implementation tomorrow.
types/types.go
Outdated
// BICReplacementCandidate2 is an item returned by BlobInfoCache2.CandidateLocations2. | ||
type BICReplacementCandidate2 struct { | ||
Digest digest.Digest | ||
CompressorName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be “uncompressed” or “unknown”? If so, that should be documented and distinct.
And do we just soldier on when getting “unknown”, or prefer correctness and don’t reuse? Just by asking that question I’m somewhat suggesting to prefer correctness, OTOH that would make third-party cache implementations half-ignored. Something for me to think about while I continue reading tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it makes sense to mark cases where we don't know because we retrieved the data from a cache that original methods. Added constants for uncompressed and unknown to avoid ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distinguishing cases where the compression type isn't known from uncompressed cases would keep us from actively breaking things, so I'm leaning that way as well. Updated dockerImageDestination.TryReusingBlob()
to skip candidate locations where the compression type isn't known.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think we don't need to skip these any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with unknown compression now? If that happens (i.e. an old BoltDB cache that knows a location but no compression, or a third-party BlobInfoCache
implementation), don’t we fall back to the old behavior of using that blob but not updating the MIME type to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OperationAndAlgorithmForCompressor()
gets passed UnknownCompression
and returns PreserveOriginal
with no specified compressor, and dockerImageDestination.TryReusingBlob()
returns them in a BlobInfo
with the original media type and no compression changes. When that eventually gets to manifest.updatedMIMEType()
, it doesn't make any changes. So until we actually read that blob again, we will continue to behave as before when reusing that blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn’t “operate as before” #733? Some version of this PR skipped the blobs with UnknownCompression
: #1089 (comment) .
That might force an extra unnecessary upload for schema1 (where the MIME type does not matter), but with schema1 disabled by docker/distribution by default, fixing #733 more reliably seems preferable.
@@ -103,6 +116,14 @@ func assertCandidatesMatch(t *testing.T, scopeName string, expected []candidate, | |||
assert.Equal(t, e, actual) | |||
} | |||
|
|||
func assertCandidatesMatch2(t *testing.T, scopeName string, expected []candidate, actual []types.BICReplacementCandidate2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the set of new tests is the same as the old tests (is it?), maybe instead move the cache.CandidateLocations{,2}
calls inside the assert…
helper, have a single helper call test both methods, and we can avoid the duplication.
(This can clearly wait until after the interface is settled.)
d97bf66
to
f8c84ef
Compare
Rebased, and fixed cases where we incorrectly used the |
fa20fe8
to
8113d28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some half-baked thoughts, hopefully useful a bit.
directory/directory_dest.go
Outdated
return false, types.BlobInfo{}, err | ||
} | ||
|
||
compressionOperation, compressionAlgorithm, _, _, err := blobinfocache.DetectCompressionFormat(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both in this and the OCI/layout case, the code only looks for exactly the specified digest and no substitution happens. So this should never cause a change in the MIME type, and the two calls to DetectCompressionFormat
seem unnecessary.
OTOH what about the canSubstitute
case in storageImageDestination.TryReusingBlob
? That one might need a MIME type update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage
is a bit of a weird beast here. It uses LayersInfosForCopy()
, so it's never on the hook for being able to reproduce the blobs that it's given. It's also in the business of trying to avoid causing changes to the manifest even when they're allowed, so it wants to only ever behave as though canSubstitute
is false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, OTOH we already need to provide an uncompressed size so that the generic copy code is consistent, IIRC, so it makes sense for that rare code path to be fully consistent instead of partially.
Let’s see how the other paths pan out, at this point it’s probably just something to eventually verify.
6299052
to
32d0532
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could change
CandidateLocations2()
to ignore entries for which the compression type isn't known -pkg/blobinfocache/boltdb/cache.appendReplacementCandidates()
would just return if thecompressionBucket
wasn't there or if it failed to return acompressorNameValue
fordigestKey
.
That’s actually a good point: a previous version had the UnknownCompression
check in dockerImageDestination.TryReusingBlob
, but it’s not conceptually the transports’ responsibility to care. Defining CandidateLocations2
to never include UnknownCompression
entries would get that knowledge out of the transports, at least, and simplify the 2 cache implementations + the compatibility wrapper a bit — OTOH it would be three places responsible for the same check, and probably require test changes.
(Conceptually the decision ideally belongs to the copy pipeline around copyLayer
, OTOH that would require the TryReusingBlob
change we’re trying to avoid, and it would be non-trivial — a callback from TryReusingBlob
back into the copy code to determine which candidate locations are acceptable?) So now I don’t have a preference WRT locating the check in dockerImageDestination
vs. CandidateLocations2
.
As for having the UnknownCompression
check or not — I’d prefer to include it, so that users can just upgrade without having to manually clear caches. Is it blocking? I guess not (it can be added in a follow-up, after all), but it’s very close.
I lean that way myself, so I can change it. |
e310c82
to
5e6ae45
Compare
Yes, it did. Hopefully I got them all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some very last nits.
Thanks for all your work!
5e6ae45
to
705ae93
Compare
Okay, updated and back in the CI queue. |
Fix a compile warning in a test. Signed-off-by: Nalin Dahyabhai <[email protected]>
Add some tests for manifest.updatedMIMEType. Signed-off-by: Nalin Dahyabhai <[email protected]>
Extend the blob info cache to also cache the name of the type of compression used on a blob that we've seen, or specific values that indicate that we know the blob was not compressed, or that we don't know whether or not it was compressed. New methods for adding known blob-compression pairs and reading candidate locations including compression information are part of a new internal BlobInfoCache2 interface which the library's BlobInfoCache implementors also implement. When we copy a blob, try to record the state of compression for the source blob, and if we applied any changes, the blob we produced. Make sure that when TryReusingBlob successfully uses a blob from the blob info cache, that it provides compression information in the BlobInfo that it returns, so that manifests can be updated to describe layers using the correct MIME types. When attempting to write a manifest, if a manifest can't be written because layers were compressed using an algorithm which can't be expressed using that manifest type, continue on to trying other manifest formats. Signed-off-by: Nalin Dahyabhai <[email protected]>
705ae93
to
5364600
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks again!
Hi, Sorry for jumping into this pull request. I think I may suffer from this issue using podman + Artifactory. I wanted to try with the latest fedora rawhide which apparently uses podman from master branch everyday, but it looks like I still have the issue. Checking podman source code, apparently the version of containers/image used there is still 5.9.0 from last december. Would you know if the container team in Red Hat plans on updating containers/image in podman in January or February, and start pushing updated podman packages with this fix in fedora rawhide ? Cheers, |
We should be cutting a new release of containers/image soon, and then this will get vendored into podman Most likely podman 3.1 or later. |
Yes could you cut a new release, please? A pluriversum of developers is waiting for that. |
Skopeo and all images mirrored to quay for OKD and the community operators, ... are also dependent on that. |
#1128 is ongoing. |
Yes this fix with be in podman 3.0 and newer versions of skopeo and buildah which will show up in rhel8.4. |
Extend the blob info cache to also cache the name of the type of compression used on a blob in a known location. (An earlier version of this patch cached the full MIME type, but required additional logic to map the MIME types in cases where we converted images during copying, so the patch was larger.)
Pass an updated MIME type and information about compression changes to PutBlob() in its input BlobInfo, so that it can populate the blob info cache with correct compression information when it succeeds. Some of the local tables we kept in
manifest/common.go
were moved to an internal package (internal/mediatypes
) so that they could be reused there.Make sure that when
dockerImageDestination.TryReusingBlob()
successfully uses a blob from the blob info cache, that it provides updated MIME type and compression information in theBlobInfo
that it returns, so that converted manifests can use them to correctly update theirLayerInfo
fields.This should fix #733.