Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - support multi-image docker archives #975

Closed
wants to merge 1 commit into from

Conversation

vrothberg
Copy link
Member

Add a MultiImageArchive{Reader,Writer} to docker/archive to support
docker archives with more than one image.

To allow the new archive reader/writer to be used for copying images,
add an Image{Destination,Source} to copy.Options. When set, the
destination/source referenced will be ignored and the specified
Image{Destination,Source} will be used instead.

Signed-off-by: Valentin Rothberg [email protected]

@vrothberg vrothberg changed the title support multi-image docker archives WIP - support multi-image docker archives Jun 29, 2020
@vrothberg
Copy link
Member Author

libpod PR to illustrate how I imagine it to be used -> containers/podman#6811

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a very quick first pass for now.

copy/copy.go Outdated Show resolved Hide resolved
docker/archive/multi-reader.go Outdated Show resolved Hide resolved
docker/archive/multi-reader.go Outdated Show resolved Hide resolved
docker/archive/multi-reader.go Outdated Show resolved Hide resolved
docker/archive/multi-writer.go Outdated Show resolved Hide resolved
docker/tarfile/dest.go Outdated Show resolved Hide resolved
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

@mtrmac, mind taking another look? I've been starring at it too long.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look, this still seems to lean rather towards inheritance, almost towards monkey-patching rather than restructuring; most importantly I think we should not need two more ImageReference implementations.

I was thinking:

  • For sources:
    • Split the part of tarfile.Source that contains archive-wide state (tarPath, removeTarPathOnClose, maybe some of the layer info) into a docker/internal/tarfile.ReadableArchive or so (no need to make this public? I might have to think about the public/private structure more. The name can certainly be better.)
    • Keep the existing public tarfile.Source working, on top of that, but also add a way to create it with a caller-supplied ReadableArchive (in which case tarfile.Source does not drive deleting temporary files).
    • Add a *ReadableArchive to archiveReference; in archive.newImageSource, if ref.readableArchive is set, use it to create a tarfile.Source; if not, use the old code path.
    • Add support for lookup by tag and ID/index/something to archiveReference; it automatically applies both to the single-image and multi-image case.
    • Then make MultiImageSource, which creates a ReadableArchive and provides an API to enumerate / create references.

And similarly for destinations.


// Reference returns an ImageReference embedding the MultiImageDestination.
func (m *MultiImageDestination) Reference() types.ImageReference {
ref := &archiveReference{path: m.path}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not include the tag

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but not on the reference = not in copy.Image error messages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH there is still the case of saving untagged images, so references don’t always have any extra data anyway.

I’ll read all of this more carefully a bit later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much appreciated, thanks!

I am on PTO tomorrow but would will go back to this on Monday morning. Ideally, we need to get the feature in next week.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac, ideas how to proceed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac @rhatdan ... can we get this moving or are we blocked on something? I am getting increasing pressure to get this done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please proceed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sorry for not getting back, I did promise I would.

Still, #975 (review) was I think fairly clear to the general direction. See #991 for an unfinished prototype of the read side. Yes, the PR is larger, but it already includes the ability to read any (even untagged) image in an archive using a textual reference, and a lot of the “new” code is actually only moved — tarfile.Source was just split into two, the only non-trivial net new code at that layer is just chooseManifest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I am still not sure how to proceed.

Now we have two PRs and I am worried we are running out of time; in fact, we might not get it into RHEL 8.3 any more.

If you agree, I want to focus on Podman-only needs first. We can still make follow-up cards for a more generally applicable solution, in case that will buy us some time. Certainly, the API shouldn't break.

docker/archive/src.go Show resolved Hide resolved
// Reset the repoTags to prevent them from leaking into a following
// image/manifest.
d.repoTags = []reference.NamedTagged{}
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still stateful, then…

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More detailed comments after a careful read, finally. Conceptually, I’d still prefer for the code paths to be shared as much as possible, instead of a layer on top that partially overrides behavior like the destination here.

return s.loadTarManifest()
if s.manifests != nil {
return s.manifests, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat contra to the documentation of the function.

// MultiImageSourceItem is a reference to _one_ image in a multi-image archive.
// Note that MultiImageSourceItem implements types.ImageReference. It's a
// long-lived object that can only be closed via it's parent MultiImageSource.
type MultiImageSourceItem struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, I’m concerned about having two ImageReference implementations with different behavior but the same string syntax. ref.Transport().ParseReference(ref.StringWithinTransport()) is documented to be “equivalent to” ref; that’s easiest to do when there is just one implementation, and not the case here (with a docker-archive:$path-formatted references that successfully access images in multi-image archives, but fail when used from the CLI).

}

// Manifest returns the tarfile.ManifestItem.
func (m *MultiImageSourceItem) Manifest() (*tarfile.ManifestItem, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing but RepoTags is really usable by callers, see e.g. the use of Config in containers/podman#6811 ; so I’m a bit reluctant to commit to this as an API. OTOH, we clearly can keep this stable.

// MultiImageDestinations allows for creating and writing to docker archives
// that include more than one image.
type MultiImageDestination struct {
*archiveImageDestination
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn’t this externally expose PutBlob and everything else?


// Close is a NOP. Please use Finalize() for committing the archive and
// closing the underlying resources.
func (m *MultiImageDestination) Close() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close() is conceptually a bit different from Finalize() (or ImageDestination.Commit); it allows cleaning up the temporary files even on error.

Sure, this is a possible way to structure the API, but it’s a bit inconvenient to use: A typically caller will use something like defer multiDest.Close() and might not even check for errors if there is already a failure with a more important root cause to preserve, whereas on success the caller really wants to check that multiDest.Finalize() did succeed. Having “commit” and “deallocate” be the same operation forces every such caller to have a committed flag that’s checked inside the defer, or to have a critical part of the process in a defer.

// Reset the repoTags to prevent them from leaking into a following
// image/manifest.
d.repoTags = []reference.NamedTagged{}
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… it’s also not documented that the caller is supposed to use (only) AddRepoTags after each PutManifest AFAICS.

@vrothberg
Copy link
Member Author

Closing as @mtrmac took over. Thanks again!

@vrothberg vrothberg closed this Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants