Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sync semantic between source and destination image repository #155

Closed
hustcat opened this issue Nov 7, 2016 · 25 comments
Closed

Comments

@hustcat
Copy link
Contributor

hustcat commented Nov 7, 2016

copy semantic in current implementation, all data(include manifest and image layer) in destination will be overwrited by source without to see if it exist.

IMO we should implement sync semantic between source and destination image repository, if layer in destination is exist, it won't be downloaded repeatedly. This will be more efficient.

@mtrmac
Copy link
Collaborator

mtrmac commented Nov 7, 2016

copy does not insist on overwriting, and in fact the docker: and atomic: transports do not re-upload existing data: https://github.com/containers/image/blob/master/docker/docker_image_dest.go#L96 .

For manifests, it seems far simpler to just upload them than to do one extra roundtrip to see whether we need to upload them. The data is small, the roundtrip latency should be more noticeable than the data transfer latency.

No, dir: does not do this; it is more or less a debugging tool. Is there anything else that is missing or important?

@hustcat
Copy link
Contributor Author

hustcat commented Nov 8, 2016

I want to implement incremental download layers of image which in the same repository from Docker hub to local dir or oci directories.
For example, busybox:v1 own layer1, busybox:v2 own layer1 and layer2. When I download busybox:v2, only layer2 should be downloaded if layer1 exist.

So that I can incremental import image from dir or oci directories to Docker daemon. This can replace docker pull command for Docker daemon.

@mtrmac
Copy link
Collaborator

mtrmac commented Nov 8, 2016

For example, busybox:v1 own layer1, busybox:v2 own layer1 and layer2. When I download busybox:v2, only layer2 should be downloaded if layer1 exist.

That’s not how dir: works, it can’t share a directory for multiple images. See more in containers/skopeo#245 (comment) . It might be possible for oci:. But then…

So that I can incremental import image from dir or oci directories to Docker daemon. This can replace docker pull command for Docker daemon.

… I can’t see how not doing copies when saving into dir/oci directories helps with an incremental import from them. The limiting factor in there is the docker-daemon: write path, which always reads all layers, because the docker load input format requires that.

@hustcat
Copy link
Contributor Author

hustcat commented Nov 9, 2016

… I can’t see how not doing copies when saving into dir/oci directories helps with an incremental import from them. The limiting factor in there is the docker-daemon: write path, which always reads all layers, because the docker load input format requires that.

We can archieve it, make docker daemon incremental load from OCI layout directories:)

@hferentschik
Copy link
Contributor

Not quite sure whether this is the right issue, but I am after a sync semantic as well. I'd like to "backup" images from a running Docker daemon to disk. I was thinking of some ostree format. Then on request, I want to sync from disk back to a Docker daemon/registry.

To provide some more context, I want to do this in the context of minishift/minishift#143. In the Minishift case we have a Docker daemon running w/i a VM. I'd like to be able to re-create the VM and restore exported images in order to avoid having to download required images over and over again.

Can this be done with containers/images?

@mtrmac
Copy link
Collaborator

mtrmac commented May 10, 2017

@hferentschik Do you just need ordinary copies, or what this issue calls “sync”, i.e. sharing of the common layers among several images? Without sharing creating a separate $backup per image, (skopeo copy docker-daemon:$original dir:$backup), and (skopeo copy dir:$backup docker-daemon:$restored) should work already.

If sharing of common layers is desirable, now (as of rc5) that OCI has an index , something like that should be possible—and is in fact very close to possible: the read path seems to be already implemented. The write path always overwrites the index, but reading and updating the old version should be a quite simple change to oci/layout/oci_dest.go.

@hferentschik
Copy link
Contributor

Do you just need ordinary copies, or what this issue calls “sync”, i.e. sharing of the common layers among several images

I would have expected that common layers are saved. If I understand this correctly, using dir, I would need to export each image into its own directory. There would be no sharing of layers. In this case what is really the difference to 'docker save'. I might as well just call 'docker save' for all images. I was hoping to gain something by using container/images.

Right now, I can copy an image from the registry to a directory, however, I have several issues which I just temporarily worked around for now (mainly around the fact that there are several Linux specifics in the code preventing me to use this library in a cross platform manner). If there is no way to save images and share layers, I might be barking up the wrong tree.

I've seen this presentation - https://fedorapeople.org/~dwalsh/Presentations/ContainersInProduction - and was hoping I could export images into a file based ostree structure.

Does this make sense?

@hferentschik
Copy link
Contributor

Without sharing creating a separate $backup per image, (skopeo copy docker-daemon:$original dir:$backup), and (skopeo copy dir:$backup docker-daemon:$restored) should work already.

Sure, but what do I gain over a plain docker save/load in this case?

@mtrmac
Copy link
Collaborator

mtrmac commented May 10, 2017

Without sharing creating a separate $backup per image, (skopeo copy docker-daemon:$original dir:$backup), and (skopeo copy dir:$backup docker-daemon:$restored) should work already.

Sure, but what do I gain over a plain docker save/load in this case?

Nothing for that use. Flexibility in sources/destinations (e.g. that can be copy from a registry to dir:, then later from dir: to daemon).

@mtrmac
Copy link
Collaborator

mtrmac commented May 10, 2017

I've seen this presentation - https://fedorapeople.org/~dwalsh/Presentations/ContainersInProduction - and was hoping I could export images into a file based ostree structure.

Right, containers/image ostree: does support multiple images in a single OSTree structure. But containers/image has currently no support for reading from OSTree.

Anyway, that would be something completely separate from the original request above, which mentions dir: (not applicable, by design) and oci: (does not exist but only a little more work remains, #155 (comment) ). If you want to do this using OSTree rather than OCI, please file a separate issue. (If you don’t care, OCI would be less work… though that doesn’t guarantee that it will happen earlier than OSTree.)

@hferentschik
Copy link
Contributor

hferentschik commented May 10, 2017

Right, containers/image ostree: does support multiple images in a single OSTree structure. But containers/image has currently no support for reading from OSTree.

:-( Got you. I was assuming that all supported "protocols" would be two way.

f you want to do this using OSTree rather than OCI, please file a separate issue. (If you don’t care, OCI would be less work… though that doesn’t guarantee that it will happen earlier than OSTree.)

I guess both are fine for my purposes. Can you give come ETA for this?

@hferentschik
Copy link
Contributor

hferentschik commented May 10, 2017

For what it's worth, #268 and #269 are the problems I ran into when trying to use image as a dependency. Apart from being able to copy to and from OCI, I really would need solutions to these issues as well, most importantly #268.

@mtrmac
Copy link
Collaborator

mtrmac commented May 11, 2017

:-( Got you. I was assuming that all supported "protocols" would be two way.

Can you give come ETA for this?

That depends on someone doing the work :) Would you be interested in helping?

@hferentschik
Copy link
Contributor

@mtrmac, sorry for the long turn-around on this, but I got side-tracked. However, I would like to take up the discussion again.

TBH, I am a bit lost atm, in terms of how the OCI format works and also what's required to make this sync semantic work. So here is some test code I've wrote:

srcRef1, err := alltransports.ParseImageName("docker://busybox:1.25")
if err != nil {
	atexit.ExitWithMessage(1, fmt.Sprintf("Invalid source: %v", err))
}
destRef1, err := alltransports.ParseImageName("oci:/Users/hardy/tmp-2/oci")
if err != nil {
	atexit.ExitWithMessage(1, fmt.Sprintf("Invalid destination %v", err))
}

copy.Image(policyContext, destRef1, srcRef1, &copy.Options{
	RemoveSignatures: false,
	SignBy:           "",
	ReportWriter:     os.Stdout,
	SourceCtx:        &types.SystemContext{},
	DestinationCtx:   &types.SystemContext{},
})

srcRef2, err := alltransports.ParseImageName("docker://busybox:1.26")
if err != nil {
	atexit.ExitWithMessage(1, fmt.Sprintf("Invalid source: %v", err))
}
destRef2, err := alltransports.ParseImageName("oci:/Users/hardy/tmp-2/oci")
if err != nil {
	atexit.ExitWithMessage(1, fmt.Sprintf("Invalid destination %v", err))
}

copy.Image(policyContext, destRef2, srcRef2, &copy.Options{
	RemoveSignatures: false,
	SignBy:           "",
	ReportWriter:     os.Stdout,
	SourceCtx:        &types.SystemContext{},
	DestinationCtx:   &types.SystemContext{},
})

So basically I am copying two versions of busybox into a oci directory. My first question is, how do I get them out again? I see the various layers, but no meta information on what images are in the "store". Am I missing something?

And concretely about this sync functionality, what are the missing pieces?

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 3, 2017

(Note that with destRef1 and destRef2 equal, the semantics is that the second copy should overwrite the first one. If you want to keep both, use different references, e.g. oci:/Users/hardy/tmp-2/oci:1.25 and oci:/Users/hardy/tmp-2/oci:1.26).

AFAICS (without testing your code in practice, so I may be overlooking something) the only missing piece is that ociDestination.Commit always creates a new index.json file pointing only at the latest manifest; it should instead parse the existing index, probably (?) remove a previous image with the same reference if it exists [and clean up the layers??? unsure], and then append an entry for the just added manifest without disturbing the rest of the index.

(FWIW see also #294 , though that should not directly impact the work on .Commit.)

@hferentschik
Copy link
Contributor

(Note that with destRef1 and destRef2 equal, the semantics is that the second copy should overwrite the first one. If you want to keep both, use different references, e.g. oci:/Users/hardy/tmp-2/oci:1.25 and oci:/Users/hardy/tmp-2/oci:1.26).

I was kind of assuming that oci:/Users/hardy/tmp-2/oci is the base directory for multiple images and that I could store arbitrary images under the same path. But you seem to imply something like:

  • oci:/Users/hardy/tmp-2/busybox:1.25
  • oci:/Users/hardy/tmp-2/busybox:1.26
  • oci:/Users/hardy/tmp-2/alpine:latest

How would that layout be able to share layers?

AFAICS (without testing your code in practice, so I may be overlooking something) the only missing piece is that ociDestination.Commit always creates a new index.json file pointing only at the latest manifest; it should instead parse the existing index, probably (?) remove a previous image with the same reference if it exists [and clean up the layers??? unsure], and then append an entry for the just added manifest without disturbing the rest of the index.

So right, looking at the busybox example again. If I copy two versions of busybox into the same oci directory, I get all the layers, but the index only contains one reference. So really the oci destinations only supports single image copies atm. It is not creating an actual OCI "repository". Correct?

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 4, 2017

I was kind of assuming that oci:/Users/hardy/tmp-2/oci is the base directory for multiple images and that I could store arbitrary images under the same path.

Sure; but copy.Image deals with ImageReference objects for individual images; it does not understand the structure of the reference, that’s up to the individual transport to define.

How would that layout be able to share layers?

See oci/layout.ParseReference: they would share the same directory but differ in the tag. (Note that my example shares the part before the colon, i.e. the directory; your syntax would be currently interpreted (again, see #294) as separate directories which indeed wouldn’t share layers.)

If I copy two versions of busybox into the same oci directory, I get all the layers, but the index only contains one reference. So really the oci destinations only supports single image copies atm. It is not creating an actual OCI "repository"

Yes. (Strictly speaking, it is creating a single-image repository.) Storing multiple images is currently unimplemented, and contributions would be welcome.

@hferentschik
Copy link
Contributor

Yes. (Strictly speaking, it is creating a single-image repository.)

Got you.

Storing multiple images is currently unimplemented, and contributions would be welcome.

I really would see this feature, since we would dearly need it for minishift/minishift#952. You seem to think that it is not so hard, but I fear it would take me quite some time to even understand where and how to plug this in and obviously it would need to fit the overall strategy/plan for containers/image. At the very least I think I would need some concrete pointers on what to do.

On a different note, while digging around I found https://github.com/sgotti/oci-store which seems a bit what I am after. However this is a single commit repo and dependencies are not part of the repo. Is this repo in any way related to what you guys are doing?

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 6, 2017

At the very least I think I would need some concrete pointers on what to do.

Is #155 (comment) enough? The minimal version, without cleaning up unreferenced layers, considering the size of ociReference.getManifestDescriptor, should be on the order of 50 lines of code, perhaps + tests.

@hferentschik
Copy link
Contributor

Is #155 (comment) enough? The minimal version, without cleaning up unreferenced layers, considering the size of ociReference.getManifestDescriptor, should be on the order of 50 lines of code, perhaps + tests.

Ok, I see what I can do. I try to work on this this week...

@cyphar
Copy link
Contributor

cyphar commented Jul 16, 2017

@hferentschik If you just want an OCI CAS implementation, take a look at https://github.com/openSUSE/umoci (specifically, oci/cas and oci/casext). It doesn't have any of the conversion code that containers/image has but it's job is to implement full OCI image support with a sane interface.

hferentschik added a commit to hferentschik/image that referenced this issue Aug 11, 2017
@hferentschik
Copy link
Contributor

@mtrmac can you have a look at pull request #326. Am I on the right track with this?

hferentschik added a commit to hferentschik/image that referenced this issue Aug 12, 2017
hferentschik added a commit to hferentschik/image that referenced this issue Aug 14, 2017
hferentschik added a commit to hferentschik/image that referenced this issue Oct 16, 2017
hferentschik added a commit to hferentschik/image that referenced this issue Oct 19, 2017
hferentschik added a commit to hferentschik/image that referenced this issue Oct 24, 2017
hferentschik added a commit to hferentschik/image that referenced this issue Oct 25, 2017
hferentschik added a commit to hferentschik/image that referenced this issue Oct 26, 2017
hferentschik added a commit to hferentschik/image that referenced this issue Oct 27, 2017
hferentschik added a commit to hferentschik/image that referenced this issue Oct 31, 2017
runcom added a commit that referenced this issue Oct 31, 2017
Issue #155 Making sure existing OCI index gets not overwritten
@rhatdan
Copy link
Member

rhatdan commented Apr 25, 2019

@hustcat @mtrmac @vrothberg I believe there is still work going on this, or at least there is a sync PR. Are we continuing to work on this?

@vrothberg
Copy link
Member

This is being tracked in containers/skopeo#524.

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 25, 2019

For the record, if you actually read the issue, this is only tangentially related to containers/skopeo#524 , and was actually fixed in #326 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants