Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker-archive: read+write #998

Closed
wants to merge 37 commits into from

Conversation

mtrmac
Copy link
Collaborator

@mtrmac mtrmac commented Jul 29, 2020

This combines #991 and #996.

@vrothberg
Copy link
Member

So far it's looking good. I found some time to wire this PR into the Podman one. Writing works very conveniently. Will take a look into loading and report back; presumably tomorrow.

@vrothberg
Copy link
Member

Created a first draft based on this PR (containers/podman#6811). It's looking pretty good so far!

@mtrmac mtrmac force-pushed the tarfile-integration branch from 5a67f7f to 5697640 Compare August 1, 2020 20:06
@vrothberg
Copy link
Member

@mtrmac, can you have a look at containers/podman#6811 and check if the usage of the new interfaces is as intended? Once, we're cool there, I can add tests to it and then do the vendor dance.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Aug 3, 2020

@mtrmac, can you have a look at containers/podman#6811 and check if the usage of the new interfaces is as intended?

Yes, looks fine.

Once, we're cool there, I can add tests to it and then do the vendor dance.

Note that the current write implementation is not correct yet, at least it generates wrong legacy-format data for images that share layers but differ in configs.

@mtrmac mtrmac force-pushed the tarfile-integration branch from 5697640 to cb6958c Compare August 6, 2020 23:45
@mtrmac
Copy link
Collaborator Author

mtrmac commented Aug 6, 2020

@vrothberg This was updated based on new versions of the two constituent PRs. I assume we’ll want to review one of the PRs individually, after which I will include the extra changes from that first PR into the other one, and this integration PR will be closed without merging.

mtrmac added 21 commits August 7, 2020 02:07
We do support the legacy format nowadays.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... so that we can add more functionality to it without making it
public API.

Keep the original docker/tarfile.Destination as a compatibility shim.

ManifestItem is moved to the private package to avoid an import cycle,
but remains visible.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
…cker/archive

... instead of the forwarding shims.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
…rfile

his is a private API now, so we can just drop it; rename the
...With[System]Context varianteto use the shorter name, and update
all callers.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
We will be splitting uses of these paths across two objects, so
make them a private API instead of a copy&pasted convention.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... to eventually allow creating multiple Destinations from a single
Writer.

NOTE: This only splits the implementation into two, mostly at function
boundary; it does NOTHING to support writing multiple images; top-level
metadata is written independently on each Destination.PutManifest .
This will be fixed soon.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... to follow the PutManifest flow.

Should not change behavior, nothing about the code was changed.

Signed-off-by: Miloslav Trmač <[email protected]>
This still only supports one image, but we will fix that momentarily.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... only to make future commits easier to review.

This introduces a bit of inefficiency (creating an on-stack
ManifestItem only to copy it into a single-element array), but
that will be gone momentarily.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... from writeLegacyLayerMetadata to the non-legacy createManifest.
Now that we have a dedicated function for computing the path consistently,
introducing another reference to it does not hurt maintainability, and the
small efficiency gain of computing the path only once is not really worth
coupling the legacy/non-legacy code, especially when the legacy code
is going to get more complex.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
NOTE: This is not sufficient to create correct multi-image archives
yet, the legacy format is still invalid.

This will eventually allow creating multiple Destinations from a single
Writer.

Should not change behavior for current callers, except that possible
JSON failures are now reported later.

Signed-off-by: Miloslav Trmač <[email protected]>
... so that we can later only do this for layers that haven't been
created yet.

Should not change behavior, apart from timing of some reported errors.

Signed-off-by: Miloslav Trmač <[email protected]>
... because it will soon depend on the rest of layerConfig.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... so that the caller does not have to care about lastLayerID and
empty images.

This preserves the current behavior of silently ignoring tags intended
for empty images. That's probably not quite right, but also not a subject
of this PR.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
Now that both the legacy and non-legacy format can incrementally add
images to a single archive, this will allow creating multiple
Destinations from a writer Writer.

Image IDs are now generated differently; that may be observable by
very old Docker versions.

Signed-off-by: Miloslav Trmač <[email protected]>
At least docker/archive will need to deal with the two
optionally separately, so such a separate constructor must exist;
and the other callers are not much more complex if they separate
it as well.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
This does not really allow _concurrency_ because we are streaming
the data to a single io.Writer, but it gives us safety against concurrent
callers (hypothetically when copying multiple images, at least).

Note that we do not currently set HasThreadSafePutBlob, although
we could, because the benefit is probably fairly small (basically it would
parallelize creating on-disk copies of streamed inputs with unknown
size or digest); that might be a wrong guess.

Also adds a sanity check against using the Writer after Finish().

Signed-off-by: Miloslav Trmač <[email protected]>
We will add another caller in the next commit.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... which allows creating ImageReferece objects that all write
to a shared tarfile.Writer.

Signed-off-by: Miloslav Trmač <[email protected]>
It's a possible user error, not a supposedly-unreachable internal
error.

Signed-off-by: Miloslav Trmač <[email protected]>
mtrmac added 16 commits August 7, 2020 02:07
Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... so that we can add more functionality to it without making it
public API.

Keep the original docker/tarfile.Source as a compatibility shim.

ManifestItem is moved to the private package to avoid an import cycle,
but remains visible.

docker/tarfile/src_test.go should logically be moved as well, but we
can't do that yet due to the import cycle.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
…archive

... instead of the forwarding shims.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
…rfile

This is a private API now, so we can just drop it; rename the
...With[System]Context variants to use the shorter names, and update
all callers.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... to eventually allow creating multiple Sources from a single
Reader (= a temporary file containing a seekable/uncompressed
copy of the archive).

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
At least docker/archive will need to deal with the two
optionally separately, so such a separate constructor must exist;
and the other callers are not much more complex if they separate
it as well.

This will also allow us to add reference lookup without having
to duplicate the API.

As a concession to the simpler callers, add a closeArchive parameter
(currently always true) that allows them not to worry about
the lifetime of the tarfile.Reader.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
Every caller will need that data; this way it can be shared across
several consumers, and we don't need to synchronize access/creation
of the parsed data structure.

Should not change behavior, except that errors are now reported
earlier.

Signed-off-by: Miloslav Trmač <[email protected]>
We can drop the unused error return value now.

Signed-off-by: Miloslav Trmač <[email protected]>
We already accept the syntax for docker-archive: references,
now implement the lookup instead of warning and ignoring the value.

Signed-off-by: Miloslav Trmač <[email protected]>
Add support for path:@index (e.g. path:@0, path:@1 ...) reference syntax
to docker-archive.

This will allow reading even untagged images from multi-image archives.

Signed-off-by: Miloslav Trmač <[email protected]>
with only two operations: Close(), and List() which
returns a set of ImageReference objects that allow accessing
the individual images.

For now, use of every reference triggers creation of a new
tarfile.Reader; that will be fixed momentarily.

Signed-off-by: Miloslav Trmač <[email protected]>
In archive.Reader, embed a reference to tarfile.Reader to
the created ImageReference objects, and use them in NewImageSource.

Signed-off-by: Miloslav Trmač <[email protected]>
Signed-off-by: Miloslav Trmač <[email protected]>
, closer to the implementation being tested.

Signed-off-by: Miloslav Trmač <[email protected]>
@mtrmac mtrmac force-pushed the tarfile-integration branch from cb6958c to d7c80f4 Compare August 7, 2020 00:19
@vrothberg
Copy link
Member

@mtrmac, I am currently combining a review of the PRs with updating the open PR in c/podman. I'll add some tests on top etc.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Aug 10, 2020

FWIW I do plan to actually test the code modified by these PRs sometime later today. Probably not automated tests at this point, though.

@vrothberg
Copy link
Member

FWIW I do plan to actually test the code modified by these PRs sometime later today. Probably not automated tests at this point, though.

Feel free to share ideas for tests in containers/podman#6811

@mtrmac
Copy link
Collaborator Author

mtrmac commented Aug 11, 2020

All of the post-merge changes in this PR have been included in #991, closing.

@mtrmac mtrmac closed this Aug 11, 2020
@mtrmac mtrmac deleted the tarfile-integration branch August 11, 2020 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants