Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing multiple images to a single docker-archive: file #996

Merged
merged 20 commits into from
Aug 11, 2020

Conversation

mtrmac
Copy link
Collaborator

@mtrmac mtrmac commented Jul 29, 2020

WIP, absolutely untested, missing locking, probably incorrect about legacy metadata reuse.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Jul 29, 2020

@vrothberg filing very early for API feedback and possible testing / experimenting.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Jul 29, 2020

See also #998 for read+write in one PR.

@vrothberg
Copy link
Member

Thanks, I'll test #998 and will comment there.

@mtrmac mtrmac force-pushed the tarfile-write branch 5 times, most recently from e776b8a to 29ff436 Compare August 6, 2020 23:30
@mtrmac
Copy link
Collaborator Author

mtrmac commented Aug 6, 2020

@vrothberg This should now be ready for review (although still basically untested). Changes since the early version:

  • docker/internal/tarfile.Writer.Finish is now Writer.Close, similar to archive.tar.Writer.Close. There isn’t a (Commit,Close) difference like in ImageDestination, so it seems reasonable to just use the standard Close name.
  • docker/internal/tarfile.Writer now uses a mutex to ensure the resulting stream is consistent, and in principle that allows copying images in parallel. I don’t expect much benefit from that, hence not setting HasThreadSafePutBlob either.
  • The actual format implementation was changed much more than the earlier versions, to correctly create the legacy-format metadata (for images that share layers but differ in configs).

mtrmac added 19 commits August 7, 2020 02:07
We do support the legacy format nowadays.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... so that we can add more functionality to it without making it
public API.

Keep the original docker/tarfile.Destination as a compatibility shim.

ManifestItem is moved to the private package to avoid an import cycle,
but remains visible.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
…cker/archive

... instead of the forwarding shims.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
…rfile

his is a private API now, so we can just drop it; rename the
...With[System]Context varianteto use the shorter name, and update
all callers.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
We will be splitting uses of these paths across two objects, so
make them a private API instead of a copy&pasted convention.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... to eventually allow creating multiple Destinations from a single
Writer.

NOTE: This only splits the implementation into two, mostly at function
boundary; it does NOTHING to support writing multiple images; top-level
metadata is written independently on each Destination.PutManifest .
This will be fixed soon.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... to follow the PutManifest flow.

Should not change behavior, nothing about the code was changed.

Signed-off-by: Miloslav Trmač <[email protected]>
This still only supports one image, but we will fix that momentarily.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... only to make future commits easier to review.

This introduces a bit of inefficiency (creating an on-stack
ManifestItem only to copy it into a single-element array), but
that will be gone momentarily.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... from writeLegacyLayerMetadata to the non-legacy createManifest.
Now that we have a dedicated function for computing the path consistently,
introducing another reference to it does not hurt maintainability, and the
small efficiency gain of computing the path only once is not really worth
coupling the legacy/non-legacy code, especially when the legacy code
is going to get more complex.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
NOTE: This is not sufficient to create correct multi-image archives
yet, the legacy format is still invalid.

This will eventually allow creating multiple Destinations from a single
Writer.

Should not change behavior for current callers, except that possible
JSON failures are now reported later.

Signed-off-by: Miloslav Trmač <[email protected]>
... so that we can later only do this for layers that haven't been
created yet.

Should not change behavior, apart from timing of some reported errors.

Signed-off-by: Miloslav Trmač <[email protected]>
... because it will soon depend on the rest of layerConfig.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... so that the caller does not have to care about lastLayerID and
empty images.

This preserves the current behavior of silently ignoring tags intended
for empty images. That's probably not quite right, but also not a subject
of this PR.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
Now that both the legacy and non-legacy format can incrementally add
images to a single archive, this will allow creating multiple
Destinations from a writer Writer.

Image IDs are now generated differently; that may be observable by
very old Docker versions.

Signed-off-by: Miloslav Trmač <[email protected]>
At least docker/archive will need to deal with the two
optionally separately, so such a separate constructor must exist;
and the other callers are not much more complex if they separate
it as well.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
This does not really allow _concurrency_ because we are streaming
the data to a single io.Writer, but it gives us safety against concurrent
callers (hypothetically when copying multiple images, at least).

Note that we do not currently set HasThreadSafePutBlob, although
we could, because the benefit is probably fairly small (basically it would
parallelize creating on-disk copies of streamed inputs with unknown
size or digest); that might be a wrong guess.

Also adds a sanity check against using the Writer after Finish().

Signed-off-by: Miloslav Trmač <[email protected]>
We will add another caller in the next commit.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... which allows creating ImageReferece objects that all write
to a shared tarfile.Writer.

Signed-off-by: Miloslav Trmač <[email protected]>
@mtrmac mtrmac changed the title WIP: Writing multiple images to a single docker-archive: file Writing multiple images to a single docker-archive: file Aug 7, 2020
@mtrmac mtrmac marked this pull request as ready for review August 7, 2020 00:20
@mtrmac mtrmac linked an issue Aug 8, 2020 that may be closed by this pull request
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhatdan
Copy link
Member

rhatdan commented Aug 11, 2020

Amazing work @mtrmac LGTM

@rhatdan rhatdan merged commit 26f5578 into containers:master Aug 11, 2020
@mtrmac mtrmac deleted the tarfile-write branch August 11, 2020 22:04
@henning-schild
Copy link

Can anyone point me to docs on how to actually use that with skopeo or buildah or any other tool really. I have multiple images and am trying to merge them into one repo image to deduplicate layers and save cost on storage and download.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Nov 21, 2022

Skopeo only copies one image at a time. podman save --multi-image-archive can create these archives, the underlying implementation https://github.com/containers/common/blob/9020fa1c54b8a871354ac7c5ce809ac8db860ae6/libimage/save.go#L135 could possibly serve as an example (though that has multiple levels of abstractions and the like).

The basic idea is NewWriter from this PR + repeated calls to the standard c/image copy.Image.

@henning-schild
Copy link

Thanks. I really do not want to know the details, just found this and thought it might be a good place to ask which of the many container tools actually uses that interface. I will take that pointer to podman and just hope that the version i can get can do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFE: Save multiple manifests to a single file
4 participants