Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] load/save oci-archive multi images #8218

Closed
wants to merge 2 commits into from

Conversation

QiWang19
Copy link
Contributor

@QiWang19 QiWang19 commented Nov 2, 2020

Allow load/save oci-archive format multi images.

$ bin/podman save -o two-images.tar -m --format oci-archive gcr.io/distroless/java docker.io/library/alpine
$ bin/podman load -i two-images.tar 

close #4646

Signed-off-by: Qi Wang [email protected]

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 2, 2020
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: QiWang19
To complete the pull request process, please assign jwhonce after the PR has been reviewed.
You can assign the PR to them by writing /assign @jwhonce in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +267 to +275
img := imageMap[id]
copyOptions := getCopyOptions(sys, writer, nil, nil, SigningOptions{RemoveSignatures: removeSignatures}, "", img.tags)
copyOptions.DestinationCtx.SystemRegistriesConfPath = registries.SystemRegistriesConfPath()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need table format

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early feedback from testing:

We need to distinguish IDs from names. Currently, IDs are treated as names and stored in annotations which leads to:

$ podman load -i ...
Loaded image(s): localhost/79fd58dc7611, localhost/d6e46aa2470d

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped some comments on the c/image PR: containers/image#1072 (review)

I think there are two things we need to address:

  1. We need to support multiple names or none.
  2. We need/should be consistent with the reference syntax of docker-archive which identifies images by index: https://github.com/containers/image/blob/master/docs/containers-transports.5.md#docker-archivepathdocker-referencesource-index

@QiWang19 QiWang19 force-pushed the multi-descriptor branch 3 times, most recently from 7a24e9b to bfb9c0b Compare November 17, 2020 20:47
@QiWang19 QiWang19 force-pushed the multi-descriptor branch 6 times, most recently from 88cd693 to 4987527 Compare November 21, 2020 17:08
@QiWang19 QiWang19 removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 22, 2020
@QiWang19
Copy link
Contributor Author

@containers/podman-maintainers @vrothberg @mtrmac PTAL

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a very quick skim, this results in a very large function that is going to be very hard to maintain (just read, but primarily keep consistent). It would be nice to share more of the format-independent infrastructure.

Most importantly the namesOrIDs parsing code should be shared if at all possible, to ensure the semantics of that parameter is consistent across archive formats — even if the actual copy implementation needs to be different because one can benefit from DockerArchiveAdditionalTags. But also the things like converting imageNames to newImages would probably benefit from sharing (or, well, eliminating; the copy already has a storage reference so converting it into a name and back is pointless extra work), and I worry a bit about newImageEvent which is not mechanically essential, it might be too easy to forget to update a copy.

(Very tentative: it might be worth splitting the archive save/load code, along with the archive-specific helpers, into a separate file that can be easier to understand as a whole — if you do that, please make it a separate commit that changes nothing in the code.)

newImages, err = r.ImageRuntime().LoadAllImagesFromOCIArchive(ctx, inputFile, signaturePolicy, writer)
if err == nil {
return getImageNames(newImages), nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These “try N different formats, succeed on any one” functions have frequently caused user confusion.

  • Would it make sense to have the user specify the format? I guess auto-detection is a desirable feature…
  • The error reporting definitely needs improving. If the user submits a docker-archive format that is mostly (or completely) valid but the load fails, the docker-archive-related error must be available to the user.
  • (Extra thought, probably not for this PR: It might be interesting to have an “autodetect archive format” helper so that we can detect the format separately (or report a clear error that the format is unrecognized), and then just do one load attempt with a simple error reporting structure. Or, well, we could go even further and share a single “decompress and untar” operation across all formats — right now we would decompress the multiple gigabytes separately for each format.)
  • … actually, the fact that we have to decompress the multiple-gigabyte archive in every format until we find one that works is a pretty big cost, and IMHO a fairly strong reason to at least allow the user to specify the format, even we still wanted to support auto-detection.

@QiWang19 QiWang19 force-pushed the multi-descriptor branch 2 times, most recently from 35916a5 to 82ecd25 Compare December 2, 2020 17:25
@zhangguanzhang
Copy link
Collaborator

Conflicting files

libpod/runtime_img.go

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 3, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 3, 2020
@QiWang19 QiWang19 force-pushed the multi-descriptor branch 2 times, most recently from 541e99c to 5c23ecf Compare December 3, 2020 20:04
@QiWang19
Copy link
Contributor Author

QiWang19 commented Dec 3, 2020

@mtrmac PTAL, Added an archivehelper to decide the format according to the manifest.json to index.json, it this a way to detect the format?

@mtrmac
Copy link
Collaborator

mtrmac commented Dec 3, 2020

@mtrmac PTAL, Added an archivehelper to decide the format according to the manifest.json to index.json, it this a way to detect the format?

That’s a neat idea, for the cases where this code does work it’s cheap enough, and for the expensive case (compressed archive) we can just not do that much work to guess. But because it can fail, AFAICS there still must be the fall-back logic to try all of the formats (well, all of the archive formats or all of the directory formats, no need for both).

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still rather unhappy about the length and duplication in Runtime.SaveImages, see earlier.

As mentioned a few times, if the guessing can fail, this still needs to try the relevant alternatives. Maybe the guessing could be a separate performance feature PR, I’m not sure whether that would help or hinder.

return getImageNames(newImages), nil
// imageFormat detects the image foramt of the input archive or directory
// returns "", if can not decide the format
func imageFormat(inputFile string) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guessImageFormat maybe, to emphasize that it can fail to make a decision?

// imageFormat detects the image foramt of the input archive or directory
// returns "", if can not decide the format
func imageFormat(inputFile string) (string, error) {
path, err := explicitfilepath.ResolvePathToFullyExplicit(inputFile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this do anything that matters for this function?

if err != nil {
return "", errors.Wrapf(err, "error detecting input file format")
}
if !archive.IsArchivePath(inputFile) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An os.Stat + IsDir would be a bit cheaper.

break
}
if err != nil {
return "", errors.Wrapf(err, "error detecting archive format")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the path taken on a compressed archive? That should probably return "", nil (and the caller must handle that) to indicate that it didn’t make a guess.

if header.Name == "index.json" {
return image.OCIArchive, nil
}
if header.Name == "manifest.json" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently for compatibility with some docker-archive writers, path.Clean(header.Name) must be used; I guess using it for both cases wouldn’t hurt.

format, err := imageFormat(inputFile)
if err != nil {
return "", err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if format can not be determined?

return getImageNames(newImages), nil
}
return "", err

return "", errors.Wrapf(err, "error pulling image")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect if src == nil (both “couldn’t guess” and “coding error”)

}

// LoadImageFromSingleImageDir loads image from the archive of single image that inputFile points to.
func (r *Runtime) LoadImageFromSingleImageDir(ctx context.Context, writer io.Writer, inputFile, signaturePolicy, format string) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be package-private AFAICS.

}
return strings.Join(newImageNames, ", "), nil
}
return "", errors.Errorf("error loading multipal images from the archive")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not say much about why things failed or what the user should do.

return nil, errors.Wrapf(err, "error retrieving local image after pulling %s", name)
}
newImages = append(newImages, newImage)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Absolutely non-blocking: If I’m checking correctly, doing the same to LoadFromArchiveReference would allow most of the Load functions involved in this PR to return []string, and two of the callers of Runtime.LoadImage would eliminate a strings.Split (and the third one would now contain the only strings.Join).

@mtrmac
Copy link
Collaborator

mtrmac commented Dec 3, 2020

@mtrmac PTAL, Added an archivehelper to decide the format according to the manifest.json to index.json, it this a way to detect the format?

(The detection is definitely useful, but not something I think must be in this PR; originally I only asked for not throwing away errors from all the attempts. If detection can fail and we still might end up guessing, the “not throwing away errors” part will continue to be necessary.)

@QiWang19 QiWang19 force-pushed the multi-descriptor branch 2 times, most recently from 1698b8f to 9d034a5 Compare December 6, 2020 16:02
@zhangguanzhang
Copy link
Collaborator

need gofmt the files

@QiWang19
Copy link
Contributor Author

QiWang19 commented Dec 8, 2020

@mtrmac Dropped the format detection from this PR. And the collections of the errors from LoadAllImageFromArchive and LoadImageFromSingleImageArchive using the appendErr, Could you take a look?

@rhatdan
Copy link
Member

rhatdan commented Dec 9, 2020

@QiWang19 needs rebase.

@zhangguanzhang
Copy link
Collaborator

still have Conflicting files

@mtrmac
Copy link
Collaborator

mtrmac commented Dec 10, 2020

still have Conflicting files

This can not be merged as-is because it vendors an unmerged c/image branch, so that’s not an urgent concern at this point.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK in broad strokes, mostly “local” suggestions.

)
appendErr := func(e error) {
if latestErr != nil {
latestErr = errors.Wrapf(latestErr, "error pulling images: %v\n", e)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will prefix some errors but not all? If I am reading this correctly,

error pulling images: error pulling images: error1
error2
error3

s/latestErr/allErrors/ maybe?


Most importantly, compare pkg/errorhandling, or maybe use it. I don’t know what are all of the coding conventions in Podman, but this does look like a problem appropriate for a common solution.

src, err := referenceFn()
if src, err = referenceFn(); err != nil {
appendErr(err)
}
if err == nil && src != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This can now be an else if to be a tiny bit clearer.)

return newImages, nil
}
logrus.Warnf("Error trying pulling multi-arch images: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • s/multi-arch images/multi-image archives/
  • For some formats like dir, LoadAllImageFromArchive will not even try, so it’s grating to output a warning and then succeed.
  • This is the only caller of LoadAllImageFromArchive as well as LoadImageFromSingleImageArchive, so it seems that there should only be one set of attempts, one list of errors (and the two archive formats should only be attempted once, not once via the multi-image reader and once via the non-reader path).

if err == nil {
return strings.Join(newImageNames, ", "), nil
}
appendErr(errors.Wrapf(err, "error trying multi docker aechive"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“archive”, and this probably could benefit from a bit of rewording (users don’t care much about the “multi” part) — I haven’t tried this to see what the eventual full error text looks like.

(In both cases).

@@ -244,48 +240,143 @@ func (ir *Runtime) SaveImages(ctx context.Context, namesOrIDs []string, format s
if isTagged {
iData.tags = append(iData.tags, refTagged)
}
} else {
iData.destNames = []string{""}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn’t throw away any previous tags. Maybe add a single imageSpecifiedByID boolean?

// extend an image with additional tags
type imageData struct {
destNames []string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If “extend an image”, the *Image should probably remain first.

I’m not 100% happy with the two duplicate name arrays, but I also can’t see much of an alternative. (Moving the docker-archive tag parsing into the docker-archive specific loop would probably add too much complexity.)

img := imageMap[id]
// For copying, we need a source reference that we can create
// from the image.
src, err := is.Transport.NewStoreReference(img.imageruntime.store, nil, id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lookup can AFAICS also happen in the shared imageQueue construction code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The src, storageReference is not exported. The imageQueue may not be able to hold a field for it.

Allow load/save oci-archive format multi images.

Signed-off-by: Qi Wang <[email protected]>
Signed-off-by: Qi Wang <[email protected]>
@QiWang19
Copy link
Contributor Author

QiWang19 commented Jan 4, 2021

@mtrmac PTAL

@github-actions
Copy link

github-actions bot commented Feb 4, 2021

A friendly reminder that this PR had no activity for 30 days.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 11, 2021
@openshift-ci-robot
Copy link
Collaborator

@QiWang19: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vrothberg
Copy link
Member

@QiWang19, do you still have time to tackle it? No worries in case not. I am going through some open PRs and issues at the moment.

@QiWang19
Copy link
Contributor Author

QiWang19 commented May 5, 2021

@QiWang19, do you still have time to tackle it? No worries in case not. I am going through some open PRs and issues at the moment.

Yes, I will find time to rebase this one and cooperate it with containers/image#1072.

@github-actions github-actions bot removed the stale-pr label May 18, 2021
@vrothberg
Copy link
Member

Closing the PR as it must now be addressed in c/common/libimage. Thanks, @QiWang19, for working on this!

@vrothberg vrothberg closed this May 31, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error loading OCI archive with multiple manifest descriptors
6 participants