exporter: Enable to specify the compression type for all layers of the finally exported image #2057

ktock · 2021-04-01T01:20:00Z

resolves: #1911

Currently, exporter's compression flag is applied only to newly created layers. So the exported image can be a mix of several compression types (uncompressed and gzip) even if the compression type is specified.

It would be great if we can specify the compression type of all layers of the finally exported image.

This commit introduces a new exporter flag force-compression enabling to specify the compression type of the exported layers. If layers don't match the type specified by compression option, they are forcefully converted to the targeting type.

In addition to making sure the final compression type, this flag, in the future, will also help users to quickly catch up and try with updates & trends on novel image layer formats (e.g. zstandard, estargz, zstd:chunked, etc).

How compression variant contents are tracked?

When a user specifies a compression type different from the blob's original (i.e. pulled) one, a conversion happens. Once the blob is converted, the result contents are cached to the local content store. *immutableRef tracks its compression variant contents using its lease and the following label to the original (pulled) content blob.

buildkit.io/compression/digest.<compression type> = <sha256 digest>

During converting compression type, this label is looked up first. If there is no targeting label or the content is lost in the content store, conversion happens.

Test of resolving #1911

BUILD="buildctl build --frontend=dockerfile.v0 --local context=/tmp/tmpimg --local dockerfile=/tmp/tmpimg "
$BUILD --output type=oci,dest=/tmp/uncompressed.tar,oci-mediatypes=true,compression=uncompressed
$BUILD --output type=oci,dest=/tmp/gzip.tar,oci-mediatypes=true,compression=gzip,force-compression=true

The above commands produce gzip.tar with gzip-compressed layers:

{
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:b105e3a8ccf94db2a87393ad4bdb1bd938750f1e8a72b54f4ff15573afdf7396",
    "size": 1267
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:9d48c3bd43c520dc2784e868a780e976b207cbf493eaff8c6596eb871cbd9609",
      "size": 2789669
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:49f20b6e614f787092825b2e2f22ad62be1c5e6fb9aec85276039383df91c6a7",
      "size": 103
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:3de9369605bb1c6f39b1559feb19f77f4aad8f5d0b4cca92a6ef827eb423a6fa",
      "size": 106
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:99cd2a06d8a234d71e8c5dccafd0bdf46497ed6b2d5603f538d7608428844cd6",
      "size": 106
    }
  ]
}

@AkihiroSuda @tonistiigi @fuweid

README.md

exporter/containerimage/writer.go

ktock · 2021-04-01T11:37:32Z

PTAL

exporter/containerimage/writer.go

ktock · 2021-04-08T03:33:01Z

@tonistiigi Thanks for taking look at this.
How is this going?

ktock · 2021-04-21T02:26:42Z

Can we move this forward?

tonistiigi

I don't understand how this does tracking of the state of the created blobs. Afaics they are not tracked by the cache manager and I don't see any leases around writers. iiuc the tracked blob is still created with the old code, ignoring compression settings and then that blob may be converted in a second pass.

As mentioned in #1911 (comment) the hardest part about this should be how to associate multiple blobs with a single record.

tonistiigi · 2021-04-21T05:58:14Z

exporter/containerimage/writer.go

+	}
+
+	// unlazy layers as converter uses layer contents in the content store
+	// TODO(ktock): un-lazy only layers whose type is different from the target, selectively.


This seems like a requirement. Otherwise this introduces a regression into exporting lazy refs.

It's also a sign that blob creation is not happening in the correct place as Remote already contains a reference to a blob(s).

ktock · 2021-05-26T01:15:19Z

@tonistiigi Thanks for the review. Fixed the design to track compression variant contents as the following.

When a user specifies a compression type different from the blob's original (i.e. pulled) one, a conversion happens. Once the blob is converted, the result contents are cached to the local content store. *immutableRef tracks its compression variant contents using its lease and the following label of the original (pulled) content blob.

buildkit.io/compression/digest.<compression type> = <sha256 digest>

During converting compression type, this label is looked up first. If there is no targeting label or the content is lost in the content store, conversion happens.

ktock · 2021-06-01T13:22:07Z

Can we move this forward?

sipsma · 2021-07-01T19:19:10Z

If there are multiple blobs for a single ref, do each of them count when calculating disk usage of a ref in the prune code? As far as I can see, because multiple blobs per ref are tracked only using leases, that means the cache's metadata still only stores one blob ID per ref (whichever one is created first I suppose). This in turn means that this code will calculate the disk usage using only one of those blobs:

buildkit/cache/refs.go

Lines 206 to 211 in edc28d1

    
           if dgst := getBlob(cr.md); dgst != "" { 
        
           	info, err := cr.cm.ContentStore.Info(ctx, digest.Digest(dgst)) 
        
           	if err == nil { 
        
           		usage.Size += info.Size 
        
           	} 
        
           }

sipsma · 2021-07-01T19:24:00Z

cache/remote.go

+		mu.Lock()
+		mprovider.Add(layerP)
+		mu.Unlock()
+		if forceCompression {


I feel like this code that creates the blobs with the correct compression type conceptually makes more sense to be in the existing computeBlobChain code, somewhere around here maybe:

buildkit/cache/blobs.go

Lines 57 to 58 in edc28d1

if refInfo.Blob != "" {

return nil, nil

Not a strong opinion though, so if that's easier said than done I don't think it's a big deal to leave it here.

ktock · 2021-07-02T10:18:39Z

@sipsma Thank you for the review.
Moved the blob creation logic into computeBlobChain and added a logic to accumulate the size of compression variants.

sipsma

@ktock commented on one more corner case, otherwise LGTM

sipsma · 2021-07-07T00:57:59Z

cache/refs.go

+				// accumulate size of compression variant blobs
+				if strings.HasPrefix(k, compressionVariantDigestLabelPrefix) {
+					if cdgst, err := digest.Parse(v); err == nil {
+						if info, err := cr.cm.ContentStore.Info(ctx, cdgst); err == nil {


What if you import 2 images that are identical except they use different compression types for their layers and then tried to export each one to their converted compression format. You could then end up in a situation where the blob returned by getBlob on this ref also has a compression variant label attached to it, meaning the size would get double counted here, right?

If so, I guess you can just check that cdgst doesn't equal getBlob? Or something like that.

Added a check to avoid double count if a label points to the blob itself.

Signed-off-by: ktock <[email protected]>

ktock changed the title ~~Enable to specify the compression type for all layers of the finally exported image~~ exporter: Enable to specify the compression type for all layers of the finally exported image Apr 1, 2021

AkihiroSuda reviewed Apr 1, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

AkihiroSuda reviewed Apr 1, 2021

View reviewed changes

exporter/containerimage/writer.go Outdated Show resolved Hide resolved

AkihiroSuda mentioned this pull request Apr 1, 2021

Exporter: support nydus image export #2045

Closed

ktock force-pushed the export-compression branch from 870753e to a7b0faa Compare April 1, 2021 10:55

AkihiroSuda reviewed Apr 1, 2021

View reviewed changes

exporter/containerimage/writer.go Outdated Show resolved Hide resolved

AkihiroSuda approved these changes Apr 1, 2021

View reviewed changes

tonistiigi self-requested a review April 1, 2021 16:25

tonistiigi added the status/do-not-merge label Apr 1, 2021

tonistiigi reviewed Apr 21, 2021

View reviewed changes

ktock force-pushed the export-compression branch 2 times, most recently from 24c399c to ec8a782 Compare May 26, 2021 01:00

tonistiigi added this to the v0.9.0 milestone Jun 11, 2021

AkihiroSuda requested a review from tonistiigi June 29, 2021 05:28

sipsma reviewed Jul 1, 2021

View reviewed changes

ktock force-pushed the export-compression branch from ec8a782 to 7e98e60 Compare July 2, 2021 10:16

ktock force-pushed the export-compression branch from 7e98e60 to 1210821 Compare July 2, 2021 10:20

sipsma reviewed Jul 7, 2021

View reviewed changes

tonistiigi removed the status/do-not-merge label Jul 7, 2021

Enable to forcefully specify compression type

3152eab

Signed-off-by: ktock <[email protected]>

ktock force-pushed the export-compression branch from 1210821 to 3152eab Compare July 7, 2021 03:01

tonistiigi approved these changes Jul 7, 2021

View reviewed changes

tonistiigi merged commit b055d2d into moby:master Jul 7, 2021

thaJeztah mentioned this pull request Feb 7, 2023

builder: define GetRemotes for the worker moby/moby#44920

Merged

ben-z mentioned this pull request Aug 12, 2024

Use estargz compression and add more tags WATonomous/actions-runner-image#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exporter: Enable to specify the compression type for all layers of the finally exported image #2057

exporter: Enable to specify the compression type for all layers of the finally exported image #2057

ktock commented Apr 1, 2021 •

edited

Loading

ktock commented Apr 1, 2021

ktock commented Apr 8, 2021

ktock commented Apr 21, 2021

tonistiigi left a comment

tonistiigi Apr 21, 2021

tonistiigi Apr 21, 2021

ktock commented May 26, 2021

ktock commented Jun 1, 2021

sipsma commented Jul 1, 2021

sipsma Jul 1, 2021

ktock commented Jul 2, 2021

sipsma left a comment

sipsma Jul 7, 2021

ktock Jul 7, 2021

exporter: Enable to specify the compression type for all layers of the finally exported image #2057

exporter: Enable to specify the compression type for all layers of the finally exported image #2057

Conversation

ktock commented Apr 1, 2021 • edited Loading

How compression variant contents are tracked?

Test of resolving #1911

ktock commented Apr 1, 2021

ktock commented Apr 8, 2021

ktock commented Apr 21, 2021

tonistiigi left a comment

Choose a reason for hiding this comment

tonistiigi Apr 21, 2021

Choose a reason for hiding this comment

tonistiigi Apr 21, 2021

Choose a reason for hiding this comment

ktock commented May 26, 2021

ktock commented Jun 1, 2021

sipsma commented Jul 1, 2021

sipsma Jul 1, 2021

Choose a reason for hiding this comment

ktock commented Jul 2, 2021

sipsma left a comment

Choose a reason for hiding this comment

sipsma Jul 7, 2021

Choose a reason for hiding this comment

ktock Jul 7, 2021

Choose a reason for hiding this comment

ktock commented Apr 1, 2021 •

edited

Loading