Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework how we use containers/storage #305

Merged
merged 35 commits into from
Dec 14, 2017
Merged

Conversation

nalind
Copy link
Member

@nalind nalind commented Jul 13, 2017

This is a fairly large patch, but I'm hoping the new version reads better than the old one.

HasBlob() learns to check for a blob in storage using its index-by-digest APIs, so we should be able to avoid pulling layers more often.

It reworks the storageImageDestination implementation to cache blobs in a temporary directory until Commit() is called. This lets us do a couple of useful things at Commit() time:

  • We can use the manifest's LayersInfos to guide applying layers to build up an image in storage, directly reusing layers when possible, using their contents when it isn't, or using cached blobs when there is no local copy.
  • We can use the manifest's ConfigInfo to generate an ID for the local image, avoiding duplication on pulls.

When combined, the effect of these is that pulling a given image twice results in one image record in the local storage, and the second pull downloads only the manifest, configuration, and signatures.

It reworks the storageImageSource to return an updated version of the image's manifest that presents the layers in uncompressed form, so it can be used by copy.Image() now, and uses the newer DiffOptions to avoid needlessly recompressing things. This turned up a bug in containers/storage, so we bump the version we vendor to one that includes a fix for that.

@nalind nalind force-pushed the storage-update branch 3 times, most recently from 613bb2b to 5114857 Compare July 19, 2017 14:26
@rhatdan
Copy link
Member

rhatdan commented Jul 19, 2017

@nalind Did the tests freeze on this?

@nalind nalind force-pushed the storage-update branch 2 times, most recently from 04bed1c to 0330049 Compare July 20, 2017 15:24
@nalind
Copy link
Member Author

nalind commented Jul 20, 2017

@rhatdan Travis CI passes, but the code-review/pullapprove check is manual.
I think the changes I wanted to make are ready. We make references that use digests usable at the cost of having to lie about the image reference returned by storageImageSource objects, but we can dependably read images now. (The changes to make it possible to refer to images using references that include digests require some indirection to be able to retrieve layer lists from their manifests, when those manifests may no longer match those digests.)
CI is passing with updated containers/storage and copies of this branch in skopeo, buildah, and cri-o, so I think it's ready for review.

@rhatdan
Copy link
Member

rhatdan commented Jul 20, 2017

@mtrmac @runcom PTAL

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like that ImageSource now produces consistent images; the code is, though, working way too hard to make types.Image fit for these uses, and it is IMHO not worth it—it’s not just the c/i/storage complexity, but also the fragility of c/i/image.Image now having to care about these not-quite-complete images as well.

Most of the complexity and dependencies of types.Image is due to the format conversions between different manifest formats, which this code does not need to deal with at all.

The ImageSource path could be much clearer by splitting the relevant part of UpdatedImage into a separate manifest-only (i.e. not “consistent image”) updater, something like

type containers/image/manifest.Manifest interface {
    // as in types.Image
    ConfigInfo() types.BlobInfo
    LayerInfos() []types.BlobInfo
    // as in types.ManifestUpdateOptions.LayerInfos
    UpdateLayerInfos([]types.BlobInfo)
    // as in genericManifest.serialize()
    Serialize() []byte
}
func ManifestInstanceFromBlob([]byte, mimeType string) Manifest

and moving the raw parsers/serializers from c/i/image to c/i/manifest .

This would work for ImageDestination.Commit as well, as far as extracting the layer list goes… the creation time is awkward, though. Perhaps that could be done with a specialized “getConfig” callback, which on the storage side would only have to deal with non-layer blobs and thus be much simpler than storageImageUnnamedDestination.GetBlob

Signatures []byte `json:"-"` // Signature contents, temporary
SignatureSizes []int `json:"signature-sizes"` // List of sizes of each signature slice
directory string // Temporary directory where we store blobs until Commit() time
counter int32 // A counter that we use for computing filenames to assign to blobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this member have a more descriptive name, nextTempFileID or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Changing it.

// Check if the blob corresponds to a diff that was used to initialize any layers, either
// before or after decompression, since we don't care.
uncompressedLayers, err := s.transport.store.LayersByUncompressedDigest(info.Digest)
compressedLayers, err := s.transport.store.LayersByCompressedDigest(info.Digest)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to look up the compressed layer? newImage ensures that the manifest can only contain uncompressed digests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose that though storageUnnamedImageReader is not dealing with an updated manifest, it's not going to get called to read blobs using the digests from the not-yet-updated manifest that it would return. Reworking it to only look for layers using the index of uncompressed digests.

if decompress {
noCompression := archive.Uncompressed
diffOptions = &storage.DiffOptions{
Compression: &noCompression,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS DiffOptions.Compression is only used in layerStore.Diff’s maybeCompressReadCloser, which implies that the input to this is always uncompressed, and layerStore.Diff compresses it unless told not to, burning CPU.

Why would we ever want the compressed blobs instead of the originals here?

(For push to a remote repository, the blobs can be compressed in copy.Image, which also takes care of updating the manifests as needed.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lately, layerStore.Diff only compresses a layer diff if it was originally applied in compressed form. The files that comprise the layer are stored on disk in uncompressed form, of course, but a reader would historically ask for it using the digest from the unmodified manifest, so it was doing the best that it could by defaulting to using the same type of compression which was originally used for the layer.

A storageImageSource always asks for compression to be skipped, and since we're removing the ability to read layers using compressed digests from storageUnnamedImageReader, we don't need to bother with having it do things differently, so the GetBlob logic can be simplified.

signature, err := s.transport.store.ImageBigData(s.ID, "signatures")
if err != nil {
if !os.IsNotExist(errors.Cause(err)) {
logrus.Debugf("got error %v looking up signatures for image %q", err, s.ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ignore the error intentionally? (This doesn’t hurt signature security, of course; it seems to be just a tradeoff between not failing on inability to read signatures images which don’t need to be signed at all vs. failing at the appropriate place with a correct error message for images which do need to be signed but the signatures can't be read.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A file-not-found error would be ignored, but I'll restructure it to not do that, and instead skip the attempt to read a file if we don't expect to find anything.

diffOptions = &storage.DiffOptions{
Compression: &noCompression,
}
if layer.UncompressedSize < 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageUnnamedImageReader.getSize fails if it finds UncompressedSize < 0; can this happen, or is perhaps every layer guaranteed to have a valid UncompressedSize? (I don’t know, maybe it can happen: getSize is walking the image’s Parent list while this is looking up an image by digest, so the two sets of layers are not necessarily equal.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW the documentation of UncompressedSize says “If UncompressedDigest is not set, this should be treated as if it were an uninitialized value.”, but nothing in this file checks for a set UncompressedDigest before accessing UncompressedSize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layers that have an UncompressedDigest value will have a valid UncompressedSize. Layers that don't would have it set to an uninitialized 0, but we can assume it's there for anything returned by LayersByUncompressedDigest. That's not an assumption we can make when walking the parent list, so I'll add those checks.

// parse the manifest to get a list of which blobs are filesystem layers, leaving any cached
// files that aren't filesystem layers to be saved as data items.
if s.image == nil {
img, err := image.FromSource(&storageImageUnnamedDestination{named: s})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also quite awkwkard, if the point is primarily to read the list of blobs.

if inspect, err := s.image.Inspect(); err == nil {
logrus.Debugf("setting image creation date to %s", inspect.Created)
options = &storage.ImageOptions{
CreationDate: inspect.Created,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch ouch ouch. This is the only place which requires constructing a types.Image (and actually makes storageImageUnnamedDestination.GetBlob necessary for config.json).

// GetManifest reads the manifest that we intend to store. If we haven't been given one (yet?),
// generate one.
func (s *storageImageUnnamedDestination) GetManifest() ([]byte, string, error) {
if len(s.named.manifest) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other transports just fail if the client did not call .PutManifest(); are there users of containers-storage who need the manifest generated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pathological case, I guess. If Commit() is called without a manifest having been written, we would fail, but if that's what's expected for that case, it'll probably simplify things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been a few cases suggested to allow adding / modifying signatures in an ImageDestination without doing anything about the blobs/manifests… but that’s not quite relevant to this case.

In theory calling PutBlob is expected to also call PutManifest; if buildah or some other caller isn’t doing that currently, I guess I’d rather have them explicitly construct a manifest (using the discussed manifest interface I guess) than to have every destination deal with constructing a manifest from its own data, and do the otherwise often unnecessary bookkeeping to track all uploaded / reapplied blobs.

@@ -30,6 +33,8 @@ func newReference(transport storageTransport, reference, id string, name referen
reference: reference,
id: id,
name: name,
tag: tag,
digest: digest,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren’t the tag and digest values stored inside the reference.Named? That would eliminate the failure paths in DockerReference which now have to be silently ignored.

Copy link
Collaborator

@mtrmac mtrmac Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also reference vs. name+tag+digest seem redundant at a first glance, but I haven’t checked in detail.)

@@ -101,60 +104,124 @@ func (s *storageTransport) DefaultGIDMap() []idtools.IDMap {
// relative to the given store, and returns it in a reference object.
func (s storageTransport) ParseStoreReference(store storage.Store, ref string) (*storageReference, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME: Didn’t really review the reference changes yet.

@mtrmac
Copy link
Collaborator

mtrmac commented Aug 1, 2017

The ImageSource path could be much clearer by splitting the relevant part of UpdatedImage into a separate manifest-only (i.e. not “consistent image”) updater

FWIW I’m working on this in https://github.com/mtrmac/image/tree/manifest-editing .

@nalind
Copy link
Member Author

nalind commented Aug 8, 2017

Okay, I've just repushed with some of the smaller changes discussed above. Is the manifest-editing branch stable enough that I can try rebasing onto it?

@mtrmac
Copy link
Collaborator

mtrmac commented Aug 9, 2017

Is the manifest-editing branch stable enough that I can try rebasing onto it?

I’d call it a moderately useful proof of concept; it shows that the API basically works, but it’s missing a top-level dispatcher (image.manifestInstanceFromBlob-style), only the OCI implementation is finished, and most importantly it doesn’t have a solution for getting the creation date (which lives in the config, not in the manifest).

After grafting a stub for the dispatcher it could be useful for checking that the proposed approach handles everything needed for this PR, but it can’t be used to get a complete working implementation yet I’m afraid.

Switch from using a custom diffID type to digest.Digest, and from local
image and rootfs types to manifest.Schema2Image.

Signed-off-by: Nalin Dahyabhai <[email protected]>
Switch from using custom image, rootfs, imageHistory, and v1Image types
to using Schema2Image, Schema2RootFS, Schema2History, and Schema2V1Image
types from the manifest package.

Signed-off-by: Nalin Dahyabhai <[email protected]>
The Config field in a Schema2Image is a pointer, so we need to check
that it's not nil before attempting to read its Labels field when
populating an ImageInspectInfo structure.

Signed-off-by: Nalin Dahyabhai <[email protected]>
If the OCI image includes a DockerVersion field, attempt to preserve its
value for the ImageInspectData that we're building.

Signed-off-by: Nalin Dahyabhai <[email protected]>
Populate the layers list when we inspect a manifest.

Signed-off-by: Nalin Dahyabhai <[email protected]>
Replace imageInspectInfo() implementations in the image package with
versions that call the manifest Inspect() methods.

Signed-off-by: Nalin Dahyabhai <[email protected]>
Move most of the storageImageDestination.computeID() logic into an
ImageID() method of manifest types.

For Schema1, break it into a two-step process, so that we construct a
new configuration blob first, and digest it to produce the ID in a
wrapper.  Update image/manifestSchema1.convertToManifestSchema2() to use
that same configuration blob.

Signed-off-by: Nalin Dahyabhai <[email protected]>
When attempting to locate an image, if we have a name+digest (i.e.,
canonical reference), use the store's new ImagesByDigest() method to
search for images that match a given digest, and which have at least one
name that matches the specified name, treating the reference as an
implicit name so long as it has an explicit name that matches (give or
take a tag).

Signed-off-by: Nalin Dahyabhai <[email protected]>
Teach storageImageSource ImageSource objects to return their size, too.
A consumer that can do the type assertion can avoid some parsing that we
do when wrapping an ImageSource to create an Image if it was previously
only initializing the Image to get this one piece of information.

Signed-off-by: Nalin Dahyabhai <[email protected]>
Now that the storage library lets us set the digest of an image, set it
to the digest of a preprocessed copy of the image's digest, in case it
differs from the manifest as we store it.

Signed-off-by: Nalin Dahyabhai <[email protected]>
@nalind
Copy link
Member Author

nalind commented Dec 11, 2017

Fixed some issues around parsing references that were possibly-truncated image IDs.

giuseppe added a commit to giuseppe/storage that referenced this pull request Dec 13, 2017
usage example:

skopeo copy docker-daemon:busybox:latest containers-storage:\[overlay2@/var/lib/containers/storage+/var/run/containers/storage:overlay2.ostree_repo=/var/lib/containers/storage/overlay2/ostree/.repo,overlay2.override_kernel_check=1\]\busybox

Skopeo needs a bugfix from:

containers/image#305

which is not merged yet.  Just apply this patch:

diff --git a/vendor/github.com/containers/image/storage/storage_image.go b/vendor/github.com/containers/image/storage/storage_image.go
index 08fa71b..5edccce 100644
--- a/vendor/github.com/containers/image/storage/storage_image.go
+++ b/vendor/github.com/containers/image/storage/storage_image.go
@@ -174,7 +174,7 @@ func (s *storageImageDestination) putBlob(stream io.Reader, blobinfo types.BlobI
                        id = ddigest.Canonical.FromBytes([]byte(parentLayer + "+" + digest.String())).Hex()
                }
                // Attempt to create the identified layer and import its contents.
-               layer, uncompressedSize, err := s.imageRef.transport.store.PutLayer(id, parentLayer, nil, "", true, multi)
+               layer, uncompressedSize, err := s.imageRef.transport.store.PutLayer(id, parentLayer, nil, "", false, multi)
                if err != nil && errors.Cause(err) != storage.ErrDuplicateID {
                        logrus.Debugf("error importing layer blob %q as %q: %v", blobinfo.Digest, id, err)
                        return errorBlobInfo, err

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this pull request Dec 13, 2017
usage example:

skopeo copy docker-daemon:busybox:latest containers-storage:\[overlay2@/var/lib/containers/storage+/var/run/containers/storage:overlay2.ostree_repo=/var/lib/containers/storage/overlay2/ostree/.repo,overlay2.override_kernel_check=1\]\busybox

Skopeo needs a bugfix from:

containers/image#305

which is not merged yet.  Just apply this patch:

diff --git a/vendor/github.com/containers/image/storage/storage_image.go b/vendor/github.com/containers/image/storage/storage_image.go
index 08fa71b..5edccce 100644
--- a/vendor/github.com/containers/image/storage/storage_image.go
+++ b/vendor/github.com/containers/image/storage/storage_image.go
@@ -174,7 +174,7 @@ func (s *storageImageDestination) putBlob(stream io.Reader, blobinfo types.BlobI
                        id = ddigest.Canonical.FromBytes([]byte(parentLayer + "+" + digest.String())).Hex()
                }
                // Attempt to create the identified layer and import its contents.
-               layer, uncompressedSize, err := s.imageRef.transport.store.PutLayer(id, parentLayer, nil, "", true, multi)
+               layer, uncompressedSize, err := s.imageRef.transport.store.PutLayer(id, parentLayer, nil, "", false, multi)
                if err != nil && errors.Cause(err) != storage.ErrDuplicateID {
                        logrus.Debugf("error importing layer blob %q as %q: %v", blobinfo.Digest, id, err)
                        return errorBlobInfo, err

Signed-off-by: Giuseppe Scrivano <[email protected]>
giuseppe added a commit to giuseppe/storage that referenced this pull request Dec 14, 2017
usage example:

skopeo copy docker-daemon:busybox:latest containers-storage:\[overlay2@/var/lib/containers/storage+/var/run/containers/storage:overlay2.ostree_repo=/var/lib/containers/storage/overlay2/ostree/.repo,overlay2.override_kernel_check=1\]\busybox

Skopeo needs a bugfix from:

containers/image#305

which is not merged yet.  Just apply this patch:

diff --git a/vendor/github.com/containers/image/storage/storage_image.go b/vendor/github.com/containers/image/storage/storage_image.go
index 08fa71b..5edccce 100644
--- a/vendor/github.com/containers/image/storage/storage_image.go
+++ b/vendor/github.com/containers/image/storage/storage_image.go
@@ -174,7 +174,7 @@ func (s *storageImageDestination) putBlob(stream io.Reader, blobinfo types.BlobI
                        id = ddigest.Canonical.FromBytes([]byte(parentLayer + "+" + digest.String())).Hex()
                }
                // Attempt to create the identified layer and import its contents.
-               layer, uncompressedSize, err := s.imageRef.transport.store.PutLayer(id, parentLayer, nil, "", true, multi)
+               layer, uncompressedSize, err := s.imageRef.transport.store.PutLayer(id, parentLayer, nil, "", false, multi)
                if err != nil && errors.Cause(err) != storage.ErrDuplicateID {
                        logrus.Debugf("error importing layer blob %q as %q: %v", blobinfo.Digest, id, err)
                        return errorBlobInfo, err

Signed-off-by: Giuseppe Scrivano <[email protected]>
@runcom
Copy link
Member

runcom commented Dec 14, 2017

CI is still failing, I'll merge this once it's green

Modify storage.newImageSource to not bother attempting to decode an
image's metadata if it's empty.  At this point, it only contains the
list of signature sizes, and for unsigned images, that's empty.

Signed-off-by: Nalin Dahyabhai <[email protected]>
@runcom
Copy link
Member

runcom commented Dec 14, 2017

Ci is green, I'm merging, @nalind please update skopeo and CRI-O (and c/storage if needed)

@runcom runcom merged commit 1744973 into containers:master Dec 14, 2017
@giuseppe
Copy link
Member

@nalind @runcom I've adjusted the ostree implementation to take advantage of LayerInfosForCopy. Pretty neat and works well:

#392

@nalind nalind deleted the storage-update branch December 14, 2017 20:17
@mtrmac mtrmac mentioned this pull request Feb 13, 2018
giuseppe added a commit to giuseppe/storage that referenced this pull request Mar 1, 2018
usage example:

skopeo copy docker-daemon:busybox:latest containers-storage:\[overlay2@/var/lib/containers/storage+/var/run/containers/storage:overlay2.ostree_repo=/var/lib/containers/storage/overlay2/ostree/.repo,overlay2.override_kernel_check=1\]\busybox

Skopeo needs a bugfix from:

containers/image#305

which is not merged yet.  Just apply this patch:

diff --git a/vendor/github.com/containers/image/storage/storage_image.go b/vendor/github.com/containers/image/storage/storage_image.go
index 08fa71b..5edccce 100644
--- a/vendor/github.com/containers/image/storage/storage_image.go
+++ b/vendor/github.com/containers/image/storage/storage_image.go
@@ -174,7 +174,7 @@ func (s *storageImageDestination) putBlob(stream io.Reader, blobinfo types.BlobI
                        id = ddigest.Canonical.FromBytes([]byte(parentLayer + "+" + digest.String())).Hex()
                }
                // Attempt to create the identified layer and import its contents.
-               layer, uncompressedSize, err := s.imageRef.transport.store.PutLayer(id, parentLayer, nil, "", true, multi)
+               layer, uncompressedSize, err := s.imageRef.transport.store.PutLayer(id, parentLayer, nil, "", false, multi)
                if err != nil && errors.Cause(err) != storage.ErrDuplicateID {
                        logrus.Debugf("error importing layer blob %q as %q: %v", blobinfo.Digest, id, err)
                        return errorBlobInfo, err

Signed-off-by: Giuseppe Scrivano <[email protected]>
mtrmac added a commit that referenced this pull request Mar 15, 2018
mtrmac added a commit that referenced this pull request Jun 5, 2018
chrisruffalo added a commit to chrisruffalo/buildahbox that referenced this pull request Aug 22, 2018
using buildah for push until containers/image#305 is fixed/sorted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants