blob-copy detection #611

vrothberg · 2019-04-04T17:03:38Z

Avoid redundantly copying the same blob by making use of the digest-locks from c/storage.

vrothberg · 2019-04-04T17:06:04Z

Note that the previous/parent layer is currently passed through the context.Context instead of extending the API of PutBlob(). I first want to throw CI at the code before doing the API dance.

mtrmac

Just a very quick look, by no means a detailed review of the invariants:

Passing previousLayer does work, but it’s also pretty specific to this issue; notably it only works because storageImageDestination uses types.PreserveOriginal, otherwise the previousLayer digest would not necessarily be known to the destination.

Maybe we could add an integer layerIndexInImage parameter to PutBlob/TryReusingBlob, instead: That would still allow storageImageDestination to do the same synchronization on previousLayerResult, and it feels a bit more generally usable. OTOH, it strongly hard-codes the idea that PutBlobs are called in sequence for a single image, which was…to an extent… previously assumed but could be violated (e.g. calling a single PutBlob on a c/image/docker destination without doing anything else) — but then previousLayer hard-codes this as well, and many destinations don’t work without the correct PutBlob/PutManifest/Commit sequence.

copy/copy.go

storage/storage_image.go

vrothberg · 2019-04-05T08:02:20Z

Passing previousLayer does work, but it’s also pretty specific to this issue; notably it only works because storageImageDestination uses types.PreserveOriginal, otherwise the previousLayer digest would not necessarily be known to the destination.

Yes, currently we expect the original digests to be passed through but we do the necessary bookkeeping/lookups of the layer IDs in the storage to have a mapping from blobInfo to layer.

Maybe we could add an integer layerIndexInImage parameter to PutBlob/TryReusingBlob, instead: That would still allow storageImageDestination to do the same synchronization on previousLayerResult, and it feels a bit more generally usable. OTOH, it strongly hard-codes the idea that PutBlobs are called in sequence for a single image, which was…to an extent… previously assumed but could be violated (e.g. calling a single PutBlob on a c/image/docker destination without doing anything else) — but then previousLayer hard-codes this as well, and many destinations don’t work without the correct PutBlob/PutManifest/Commit sequence.

I agree that a numerical index feels more generic. I will update the APIs accordingly.

Thanks for the initial review!

vrothberg · 2019-04-05T13:42:06Z

CI PRs:

vrothberg · 2019-04-05T16:54:10Z

Rebased on master.

mtrmac · 2019-04-05T18:20:57Z

types/types.go

@@ -265,7 +265,7 @@ type ImageDestination interface {
 	// WARNING: The contents of stream are being verified on the fly.  Until stream.Read() returns io.EOF, the contents of the data SHOULD NOT be available
 	// to any other readers for download using the supplied digest.
 	// If stream.Read() at any time, ESPECIALLY at end of input, returns an error, PutBlob MUST 1) fail, and 2) delete any data stored so far.
-	PutBlob(ctx context.Context, stream io.Reader, inputInfo BlobInfo, cache BlobInfoCache, isConfig bool) (BlobInfo, error)
+	PutBlob(ctx context.Context, stream io.Reader, inputInfo BlobInfo, layerIndexInImage int, cache BlobInfoCache, isConfig bool) (BlobInfo, error)


(Warning, vague:)

This is an ABI break, i.e. we should use a new major version number in the next release.

Do we just do it, without second thought? Create new version with semver #609 seems to suggest so.

Do we somehow say that (parts of?) the transport interfaces are going to break even within a major version, and only things like copy.Image and alltransports.ParseReference (docker.NewReference?) are stable for now? How would the callers know which one is which? (Also, I do know that at least for reading there are third-party callers of the ImageSource interface.)

Do we want to now spend extra time thinking about PutBlob (or, worse, ~everything here) to make future ABI breaks less likely? Like maybe moving (layerIndexInImage, cache, isConfig) into an extensible PutBlobAdditionalData structure that will allow us to add (but not remove!) members in the future without an ABI break?

I don’t really know…

The lazy me would prefer 1.

Something like 2. would be much easier to sustain than a full ABI but awkward to stomach for some users (in the extreme, we could make the transport interfaces internal and break those users completely, to have a clean ABI — clearly not worth it but to an extent attractive)

3. at a first glance seems like a fair amount of effort that is still likely enough to fail anyway, as the surrounding assumptions change (e.g. suddenly transports having to cope with running in a user namespace), or make the interfaces notably less readable and more difficult to maintain, considering how quickly have these interfaces (and PutBlob/TryReusingBlob in particular) been changing.

(It’s been tempting to contemplate a variant of 2., splitting c/image into an ABI-stable Go module and an ABI-unstable v0 module, or something like that, to be very transparent to users about what they can/can’t expect with a particular symbol; but the ABI-stable calls to the unstable v0 would have to be tightly coupled, and I can’t see how to do this with Go modules so that an import of the ABI-stable module automatically brings in the corresponding unstable v0 version, but so that we don’t have to maintain two independent c/image sub-repos (with independent tests, PRs, tags) or something similar. It didn’t seem viable IIRC.)

(To be clear, none of this is in any way at all a reason to reject this PR; it’s just that we do need to actually set a precedent for what #609 means.)

This is an ABI break, i.e. we should use a new major version number in the next release.

Starting with v2, we have to consider the following go-mod detail:

If the module is version v2 or higher, the major version of the module must be included as a /vN at the end of the module paths used in go.mod files (e.g., module github.com/my/mod/v2, require github.com/my/mod/v2 v2.0.0) and in the package import path (e.g., import "github.com/my/mod/v2/mypkg").

If the module is version v0 or v1, do not include the major version in either the module path or the import path. [go wiki]

This will affect us as soon as we migrate to go mod. Users of containers/image who already migrated to go mod need to mark it as +incompatible and will not be interrupted any further.

(To be clear, none of this is in any way at all a reason to reject this PR; it’s just that we do need to actually set a precedent for what #609 means.)

No worries, I really appreciate that you kick off a discussion around this topic.

In the context of this PR, I would go on and merge it so we can progress on the blob locking. Coming up with a plan and executing it sounds time-consuming and will likely entail some changes to the API (and the visibility of packages) in any case, so I think it's ~okay to perform the change here. I have opened PRs for Skopeo and Buildah and will open some for Podman and CRI-O as well to have a minimal interruption window for others who might run into build errors when using the latest c/image.

Personally, I don't find it to be sustainable over time. Breaking APIs is not only a bad experience for users but it can also be very time consuming and tedious to propagate those changes across our projects; sometimes even blocking other work in case of build errors etc. Having a stable APIs across the stack + go mod sounds so sweet (and a bit utopian) :)

I think that moving packages into ./internal is a core part of migrating to go mod (and doing semver for go in general). We need to peform (breaking) changes at some point but also need to structure the code into packages, and we can do both with ./internal packages. Analogous to the kernel's stable syscall interface and the unstable internal ones.

c/image has been quite stable. However, this PR suggests that the APIs are hard to extend. Your proposal 3) is aiming at this problem. However, even if we had a dedicated structure to encapsulate the arguments, we still had to do a major version bump as the semantics have changed (at least for the storage destination). It is still a major gain over creating build errors.

My perference is a combination of 2) and 3) as I think both are somehow mandatory for how packages in go work conceptually and how their visibility impacts semver (and go modules). But I did not think this through entirely, so take my reply with a grain of salt.

I suggest asking to give this topic a high priority so we can allocate the required time in the upcoming sprints. Parts of the tasks seem to have a rather conceptual nature of how to do things (e.g., ./internal) but there is also the massive technical part of analyzing which APIs need to be external and how we can re-design them to be easier to maintain and introduce less build breakage (as in the case of this PR).

I find this a very important discussion and once migrated to go mod it will affect other projects as well. Hence, pinging the Jedi council for future sprint planning and a general FYI (Cc @mrunalp @rhatdan).

Oh, I forgot that this is really painful only at v2. Still, having a 1.0 and breaking APIs left and right is not great; SemVer does not distinguish between 1 and 2 that way, only go mod does.

In the context of this PR, I would go on and merge it so we can progress on the blob locking. Coming up with a plan and executing it sounds time-consuming and will likely entail some changes to the API (and the visibility of packages) in any case, so I think it's ~okay to perform the change here.

Looking back, I absolutely don’t understand what we did in #609. Very little is written down and both the motivation and the forward-going policy is rather unclear to me. (And I don’t even know why we needed “semver” for c/image , and why 0.x.y is not compliant. What depends having a 1.x.y right now? Or was the goal to bring in sync the git tags and the c/image/version constants?)

The only thing that #609 writes down is the commitment to increase the major number. Yet, you are arguing against that now (and to an extent I agree, if we are going to increase the major number, let’s increase it when the API at least has a chance of staying stable.)

(Looking at GitHub, we didn’t actually tag/release the 1.6.0 created in #609; so it seems that we still technically have the option to revert #609 and pretend that it never happened.)

@rhatdan @mrunalp

(Unless there is a clear settled answer, maybe we should break this out into a separate issue?)

@mtrmac what is the issue with doing semver and incrementing the major number?

In the end, it is way easier to understand the nature of updating a dependency that is doing semver. The benefit we gained doing semver is on the user side as go mod can update the dependencies automatically while having a non-semver version scheme forces users to do the same tedious work as with vndr.

(Unless there is a clear settled answer, maybe we should break this out into a separate issue?)

We can make this a PS in tomorrow's meeting and/or set up a dedicated one to discuss it. To a certain degree, the discussion conceptually affects the entire github.com/containers/* repos and having clear answers to the questions would be beneficial for a common and consistent release engineering.

@mtrmac what is the issue with doing semver and incrementing the major number?

I was reading the following as you not wanting to do it:

In the context of this PR, I would go on and merge it so we can progress on the blob locking. Coming up with a plan and executing it sounds time-consuming and will likely entail some changes to the API (and the visibility of packages) in any case, so I think it's ~okay to perform the change here.

Was that incorrect?

BTW option 3., just moving the data into an extensible struct would not really help, if storageImageDestination now requires the layer index to be present, we would effectively be breaking the API even without breaking compilation.

Was that incorrect?

I am sorry, I did not realize that my statement was vague.

What I meant by "I would go on and merge it so we can progress on the blob locking" was that I prefer to merge this PR although it breaks the API and will entail a major version bump. In other words, I am okay with a major version bump for this particular case (as I don't see any other way to achieve the blob locking).

Ah, OK. Let’s tentatively plan to just bump the major version, then.

storage/storage_image.go

vrothberg · 2019-04-08T16:16:19Z

Buildah is about to turn green. Will open a CI PR for libpod and then for CRI-O.

copy/copy.go

docker/docker_image_dest.go

image/docker_schema2.go

manifest/docker_schema2.go

storage/storage_image.go

storage/storage_test.go

mtrmac · 2019-04-08T17:41:37Z

types/types.go

@@ -265,7 +265,7 @@ type ImageDestination interface {
 	// WARNING: The contents of stream are being verified on the fly.  Until stream.Read() returns io.EOF, the contents of the data SHOULD NOT be available
 	// to any other readers for download using the supplied digest.
 	// If stream.Read() at any time, ESPECIALLY at end of input, returns an error, PutBlob MUST 1) fail, and 2) delete any data stored so far.
-	PutBlob(ctx context.Context, stream io.Reader, inputInfo BlobInfo, cache BlobInfoCache, isConfig bool) (BlobInfo, error)
+	PutBlob(ctx context.Context, stream io.Reader, inputInfo BlobInfo, layerIndexInImage int, cache BlobInfoCache, isConfig bool) (BlobInfo, error)


Oh, I forgot that this is really painful only at v2. Still, having a 1.0 and breaking APIs left and right is not great; SemVer does not distinguish between 1 and 2 that way, only go mod does.

In the context of this PR, I would go on and merge it so we can progress on the blob locking. Coming up with a plan and executing it sounds time-consuming and will likely entail some changes to the API (and the visibility of packages) in any case, so I think it's ~okay to perform the change here.

Looking back, I absolutely don’t understand what we did in #609. Very little is written down and both the motivation and the forward-going policy is rather unclear to me. (And I don’t even know why we needed “semver” for c/image , and why 0.x.y is not compliant. What depends having a 1.x.y right now? Or was the goal to bring in sync the git tags and the c/image/version constants?)

The only thing that #609 writes down is the commitment to increase the major number. Yet, you are arguing against that now (and to an extent I agree, if we are going to increase the major number, let’s increase it when the API at least has a chance of staying stable.)

(Looking at GitHub, we didn’t actually tag/release the 1.6.0 created in #609; so it seems that we still technically have the option to revert #609 and pretend that it never happened.)

@rhatdan @mrunalp

(Unless there is a clear settled answer, maybe we should break this out into a separate issue?)

vrothberg · 2019-04-09T15:46:55Z

@mtrmac, I was fighting some issues with libpod's CI today which is now running again. Not sure, if a new issue will pop up until tomorrow but I will ping you once this PRs passed all CIs.

mtrmac · 2019-04-09T15:54:06Z

At least the layer subtree use in TryReusingBlob and the handling of throwaway layers still needs fixing.

vrothberg · 2019-04-17T09:59:11Z

@mtrmac, skopeo, buildah and libpod (only known rootless test flakes) are green. Please take a look. I will think a bit more about how we can use this code in the context of the copy-detection. As you've mentioned before: ideally, PutBlob() and TryReusingBlob() should update the progress bars but, well, where will we end? :^)

vrothberg · 2019-04-17T14:49:31Z

ideally, PutBlob() and TryReusingBlob() should update the progress bars but, well, where will we end? :^)

... I ended up adding a pkg/progress that abstracts from mpb and would allow us to add a progress.Bar to PutBlob() and TryReusingBlob() .

vrothberg · 2019-04-17T15:28:41Z

I extended PutBlob() and TryReusingBlob() now, and plan to add the copy-detection on top of this PR. It will turn the PR to be unusually big/complex but it will shorten the time frame of breaking the API.

vrothberg · 2019-09-13T07:56:22Z

Thanks @apenney! I'd love to revive the work on this subject. @baude @mrunalp @rhatdan, can we get this back on our planning radar?

rhatdan · 2019-09-13T10:39:10Z

Yes I think we should get this back on the schedule. Would be a big improvement.

To suit the needs of OpenShift CI [1], we need to avoid parallel pulls. While we have an open PR against containers/image [2] to detect parallel copy operations across processes, we are still far from getting this merged; it must be split into smaller chunks and we plan to tackle in the next few months but who knows how long this might take. To support that in CRI-O, however, we enjoy the benefit of running a daemon where we can easily avoid redundantly pulling the same image in parallel as we're in the same process space and can do all synchronisation in memory. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1785390 [2] containers/image#611 Signed-off-by: Valentin Rothberg <[email protected]>

Backport of commit 38ba094. To suit the needs of OpenShift CI [1], we need to avoid parallel pulls. While we have an open PR against containers/image [2] to detect parallel copy operations across processes, we are still far from getting this merged; it must be split into smaller chunks and we plan to tackle in the next few months but who knows how long this might take. To support that in CRI-O, however, we enjoy the benefit of running a daemon where we can easily avoid redundantly pulling the same image in parallel as we're in the same process space and can do all synchronisation in memory. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1785390 [2] containers/image#611 Signed-off-by: Valentin Rothberg <[email protected]>

mrunalp · 2020-05-13T20:23:53Z

👍 to reviving this

vrothberg · 2020-05-14T08:36:03Z

+1 to reviving this

Sounds good to me. As we are planning for what comes after Podman v2, this might be a good candidate.

@baude PTAL

rhatdan · 2020-05-14T11:42:47Z

@vrothberg I beleive this should be a big push going forward. I think we need to focus on performance after V2.

vrothberg · 2021-03-02T14:11:59Z

Finally closing this PR. I will break it into smaller pieces and submit them step by step.

vrothberg force-pushed the commit branch from 3da0913 to 15e1750 Compare April 4, 2019 17:12

vrothberg mentioned this pull request Apr 4, 2019

Do not merge - Copy detection #607

Closed

mtrmac reviewed Apr 4, 2019

View reviewed changes

copy/copy.go Outdated Show resolved Hide resolved

storage/storage_image.go Outdated Show resolved Hide resolved

vrothberg force-pushed the commit branch from 15e1750 to 9977914 Compare April 5, 2019 11:16

vrothberg force-pushed the commit branch from 9977914 to 96a4f97 Compare April 5, 2019 16:53

mtrmac reviewed Apr 5, 2019

View reviewed changes

vrothberg commented Apr 8, 2019

View reviewed changes

storage/storage_image.go Outdated Show resolved Hide resolved

vrothberg force-pushed the commit branch 2 times, most recently from 385f7c3 to d8e9814 Compare April 8, 2019 13:50

mtrmac reviewed Apr 8, 2019

View reviewed changes

vrothberg force-pushed the commit branch from d8e9814 to 154eb78 Compare April 9, 2019 15:38

vrothberg force-pushed the commit branch from 154eb78 to 2dd3b3c Compare April 10, 2019 13:54

vrothberg force-pushed the commit branch 2 times, most recently from cdbac2d to 26fba6a Compare April 17, 2019 08:15

vrothberg changed the title ~~WIP - storage destination: commit layer in PutBlob()~~ storage destination: commit layer in PutBlob() Apr 17, 2019

vrothberg force-pushed the commit branch from 8d054c2 to 5ffc741 Compare April 18, 2019 08:03

mtrmac mentioned this pull request Sep 13, 2019

podman import writes the image 3 times containers/podman#4019

Closed

mtrmac mentioned this pull request Oct 24, 2019

Improve progress reader to report start/end and offset update #732

Merged

vrothberg mentioned this pull request Nov 2, 2019

update mpb to 4.x.y #745

Closed

mtrmac mentioned this pull request Nov 19, 2019

Podman Pull Images always error when docker not containers/podman#4251

Closed

vrothberg mentioned this pull request Feb 12, 2020

Split RW and RO Layer locks per method containers/storage#473

Closed

vrothberg mentioned this pull request Mar 16, 2020

avoid parallel pulls of the same image cri-o/cri-o#3409

Merged

vrothberg mentioned this pull request Mar 17, 2020

'podman pull' cache downloaded blobs temporarily containers/podman#5521

Closed

vrothberg mentioned this pull request Mar 23, 2020

[release-1.17] avoid parallel pulls of the same image cri-o/cri-o#3449

Merged

vrothberg closed this Mar 2, 2021

vrothberg deleted the commit branch March 2, 2021 14:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blob-copy detection #611

blob-copy detection #611

vrothberg commented Apr 4, 2019 •

edited

Loading

vrothberg commented Apr 4, 2019

mtrmac left a comment

vrothberg commented Apr 5, 2019

vrothberg commented Apr 5, 2019

vrothberg commented Apr 5, 2019

mtrmac Apr 5, 2019

mtrmac Apr 5, 2019

vrothberg Apr 6, 2019

mtrmac Apr 8, 2019

vrothberg Apr 9, 2019

vrothberg Apr 9, 2019

mtrmac Apr 9, 2019

vrothberg Apr 9, 2019

mtrmac Apr 9, 2019

vrothberg commented Apr 8, 2019

mtrmac Apr 8, 2019

vrothberg commented Apr 9, 2019

mtrmac commented Apr 9, 2019

vrothberg commented Apr 17, 2019

vrothberg commented Apr 17, 2019

vrothberg commented Apr 17, 2019

vrothberg commented Sep 13, 2019

rhatdan commented Sep 13, 2019

mrunalp commented May 13, 2020

vrothberg commented May 14, 2020

rhatdan commented May 14, 2020

vrothberg commented Mar 2, 2021

blob-copy detection #611

blob-copy detection #611

Conversation

vrothberg commented Apr 4, 2019 • edited Loading

vrothberg commented Apr 4, 2019

mtrmac left a comment

Choose a reason for hiding this comment

vrothberg commented Apr 5, 2019

vrothberg commented Apr 5, 2019

vrothberg commented Apr 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vrothberg commented Apr 8, 2019

Choose a reason for hiding this comment

vrothberg commented Apr 9, 2019

mtrmac commented Apr 9, 2019

vrothberg commented Apr 17, 2019

vrothberg commented Apr 17, 2019

vrothberg commented Apr 17, 2019

vrothberg commented Sep 13, 2019

rhatdan commented Sep 13, 2019

mrunalp commented May 13, 2020

vrothberg commented May 14, 2020

rhatdan commented May 14, 2020

vrothberg commented Mar 2, 2021

vrothberg commented Apr 4, 2019 •

edited

Loading