-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exporter: Enable to specify the compression type for all layers of the finally exported image #2057
Conversation
870753e
to
a7b0faa
Compare
PTAL |
@tonistiigi Thanks for taking look at this. |
Can we move this forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this does tracking of the state of the created blobs. Afaics they are not tracked by the cache manager and I don't see any leases around writers. iiuc the tracked blob is still created with the old code, ignoring compression settings and then that blob may be converted in a second pass.
As mentioned in #1911 (comment) the hardest part about this should be how to associate multiple blobs with a single record.
exporter/containerimage/writer.go
Outdated
} | ||
|
||
// unlazy layers as converter uses layer contents in the content store | ||
// TODO(ktock): un-lazy only layers whose type is different from the target, selectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a requirement. Otherwise this introduces a regression into exporting lazy refs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also a sign that blob creation is not happening in the correct place as Remote
already contains a reference to a blob(s).
24c399c
to
ec8a782
Compare
@tonistiigi Thanks for the review. Fixed the design to track compression variant contents as the following. When a user specifies a compression type different from the blob's original (i.e. pulled) one, a conversion happens. Once the blob is converted, the result contents are cached to the local content store.
During converting compression type, this label is looked up first. If there is no targeting label or the content is lost in the content store, conversion happens. |
Can we move this forward? |
If there are multiple blobs for a single ref, do each of them count when calculating disk usage of a ref in the prune code? As far as I can see, because multiple blobs per ref are tracked only using leases, that means the cache's metadata still only stores one blob ID per ref (whichever one is created first I suppose). This in turn means that this code will calculate the disk usage using only one of those blobs: Lines 206 to 211 in edc28d1
|
mu.Lock() | ||
mprovider.Add(layerP) | ||
mu.Unlock() | ||
if forceCompression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this code that creates the blobs with the correct compression type conceptually makes more sense to be in the existing computeBlobChain
code, somewhere around here maybe:
Lines 57 to 58 in edc28d1
if refInfo.Blob != "" { | |
return nil, nil |
Not a strong opinion though, so if that's easier said than done I don't think it's a big deal to leave it here.
@sipsma Thank you for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ktock commented on one more corner case, otherwise LGTM
// accumulate size of compression variant blobs | ||
if strings.HasPrefix(k, compressionVariantDigestLabelPrefix) { | ||
if cdgst, err := digest.Parse(v); err == nil { | ||
if info, err := cr.cm.ContentStore.Info(ctx, cdgst); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you import 2 images that are identical except they use different compression types for their layers and then tried to export each one to their converted compression format. You could then end up in a situation where the blob returned by getBlob
on this ref also has a compression variant label attached to it, meaning the size would get double counted here, right?
If so, I guess you can just check that cdgst
doesn't equal getBlob
? Or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check to avoid double count if a label points to the blob itself.
Signed-off-by: ktock <[email protected]>
resolves: #1911
Currently, exporter's
compression
flag is applied only to newly created layers. So the exported image can be a mix of several compression types (uncompressed and gzip) even if thecompression
type is specified.It would be great if we can specify the compression type of all layers of the finally exported image.
This commit introduces a new exporter flag
force-compression
enabling to specify the compression type of the exported layers. If layers don't match the type specified bycompression
option, they are forcefully converted to the targeting type.In addition to making sure the final compression type, this flag, in the future, will also help users to quickly catch up and try with updates & trends on novel image layer formats (e.g. zstandard, estargz, zstd:chunked, etc).
How compression variant contents are tracked?
When a user specifies a compression type different from the blob's original (i.e. pulled) one, a conversion happens. Once the blob is converted, the result contents are cached to the local content store.
*immutableRef
tracks its compression variant contents using its lease and the following label to the original (pulled) content blob.buildkit.io/compression/digest.<compression type>
=<sha256 digest>
During converting compression type, this label is looked up first. If there is no targeting label or the content is lost in the content store, conversion happens.
Test of resolving #1911
The above commands produce
gzip.tar
with gzip-compressed layers:@AkihiroSuda @tonistiigi @fuweid