Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify kicbase cache handling by removing digest #11411

Closed
afbjorklund opened this issue May 14, 2021 · 9 comments
Closed

Simplify kicbase cache handling by removing digest #11411

afbjorklund opened this issue May 14, 2021 · 9 comments
Labels
co/docker-driver Issues related to kubernetes in container co/podman-driver podman driver issues kind/improvement Categorizes issue or PR as related to improving upon a current feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@afbjorklund
Copy link
Collaborator

afbjorklund commented May 14, 2021

Currently we load the kicbase image using both tag and digest:

gcr.io/k8s-minikube/kicbase:v0.0.22@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e

However, we can only load the tag from the cache and not the digest:

gcr.io/k8s-minikube/kicbase:v0.0.22

This makes the code not find the image next time, since it has no digest.

        "Id": "sha256:bcd131522525c9c3b8695a8d144be8d177bcd5614ec5397f188115d3be0bbc24",
        "RepoTags": [
            "gcr.io/k8s-minikube/kicbase:v0.0.22"
        ],
        "RepoDigests": [],

The current workaround is to pull the manifest down from the registry again:

$ docker pull gcr.io/k8s-minikube/kicbase:v0.0.22@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e
gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e: Pulling from k8s-minikube/kicbase
Digest: sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e
Status: Downloaded newer image for gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e
gcr.io/k8s-minikube/kicbase:v0.0.22@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e

The layers were already downloaded, so it pulls down the manifest and fills in the digest:

        "Id": "sha256:bcd131522525c9c3b8695a8d144be8d177bcd5614ec5397f188115d3be0bbc24",
        "RepoTags": [
            "gcr.io/k8s-minikube/kicbase:v0.0.22"
        ],
        "RepoDigests": [
            "gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e"
        ],

And so the code can find the image in the list. But it doesn't work offline or with podman.

$ sudo podman pull gcr.io/k8s-minikube/kicbase:v0.0.22@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e
Error: invalid image reference "gcr.io/k8s-minikube/kicbase:v0.0.22@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e": Docker references with both a tag and digest are currently not supported

It would be much simpler if we would only use the digest for the first download (to the cache).
After that, we would only use the (name and) tag to check if the image has already been loaded.

The only downside is that you cannot have two images with the same tag, but different digests.
However that doesn't really work anyway, when you pull the second one the tag will also move.

And we wouldn't have to patch go-containerregistry, if we did this...

We could still keep the digest in the filename, if we are paranoid.

kicbase_v0.0.22@sha256_7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e.tar

But then we would have to provide two parameters (or strip the digest)

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented May 14, 2021

ping @medyagh

Do we gain anything by saving the digest in the cache and in the daemon, after we have verified the initial pull ?
Shouldn't this be handled similar to the .sha256 files for the other downloads, just as a mean to verify the download ?

See PR #11346 for latest code

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented May 14, 2021

Images with only digests could be handled too. The problem is when trying to have both RepoTags and RepoDigest.

$ docker save busybox@sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb > busybox.tar
$ docker load < busybox.tar
Loaded image ID: sha256:c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698

manifest.json

[{"Config":"c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json","RepoTags":null,"Layers":["a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/layer
.tar"]}]

However, such an image will be listed as <none>, so it can be only be referenced using the image ID (that we don't save)

[
    {
        "Id": "sha256:c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698",
        "RepoTags": [],
        "RepoDigests": [],

Which makes it rather useless for the kicbase, at least with the current cache implementation.

Again, the fields are populated from registry:

$ docker pull busybox:latest
latest: Pulling from library/busybox
Digest: sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb
Status: Downloaded newer image for busybox:latest
docker.io/library/busybox:latest
[
    {
        "Id": "sha256:c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698",
        "RepoTags": [
            "busybox:latest"
        ],
        "RepoDigests": [
            "busybox@sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb"
        ],

@dinever
Copy link
Contributor

dinever commented May 14, 2021

Hi @afbjorklund, this sounds like an issue I can work on. Can I take it? Thanks!

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented May 14, 2021

@dinever : it's part of an ongoing effort to fix the kicbase and podman, see PR #11063 and PR #11346

We currently have a working solution, with a patched go-containerregistry, so this is mostly simplication...

If we remove the digest requirement, we can drop the patch and drop the current extra "daemon" code.

And always use the "kic" files in the cache, and load the kicbase image from tarball instead of remote ?

@afbjorklund afbjorklund added co/docker-driver Issues related to kubernetes in container co/podman-driver podman driver issues labels May 14, 2021
@medyagh
Copy link
Member

medyagh commented May 14, 2021

@afbjorklund how about this, we first try to get the image by digest and if we fail, we try to get it without digest ... ? would that be any better ?

are you saying in the current path we are not using the Digest SHA at all and giving the use no feeling of confidence of integrity at all ?

I kind of do want to provide the SHA anywhere that we could, it would be nice for local developement too, if u accidently tag your image kicbase:v0.0.20 minikube wont use it

@medyagh
Copy link
Member

medyagh commented May 14, 2021

if removing the digest from kicsha makes it easier for everyone, I will not object but I would love to see a proposal that at least we tried to verify the integrity and reduce the surface attack for our users

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented May 16, 2021

@afbjorklund how about this, we first try to get the image by digest and if we fail, we try to get it without digest ... ? would that be any better ?

are you saying in the current path we are not using the Digest SHA at all and giving the use no feeling of confidence of integrity at all ?

That's not really how it works, first we pull the image by digest and use our patched library to save it.

After loading it from the cache, we ignore the digest and pull the tag again and hope that they match.
Sometimes this fails horribly, like when we pulled the amd64 image by mistake on arm64 (#11208)

There is also no error reporting in the library, so the fetch and the load can fail silently and it goes on.
This means we could end up with half a file, and we didn't have any error reporting why the load failed.

So we would have to pull it by tag, and then we could check the digest to see if it matches before saving.


This is what the manifest looks like:

crane pull

[{"Config":"sha256:c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698","RepoTags":["index.docker.io/library/busybox:i-was-a-digest"],"Layers":["aa2a8d90b84cb2a9c422e7005cd166a008ccf22ef5d7d4f07128478585ce35ea.tar.gz"]}]

docker save

[{"Config":"c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json","RepoTags":null,"Layers":["a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/layer.tar"]}]

podman save

[{"Config":"c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json","RepoTags":[],"Layers":["36b45d63da70799f32c6e82de9be2e75167099b010def8771cf847c0e4c9c975.tar"]}]

When trying to save a regular image.

docker.io/library/busybox:latest@sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb

        "Id": "sha256:c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698",
        "RepoTags": [
            "busybox:latest"
        ],
        "RepoDigests": [
            "busybox@sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb"
        ],

So they each fail, in their own special way. Without the patch, we can only save tagged images (not digests).


If you use both, then docker happily uses the digests and tags it with whatever you want. There is no check.

$ docker pull gcr.io/k8s-minikube/kicbase:1.2.3@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e
gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e: Pulling from k8s-minikube/kicbase
Digest: sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e
Status: Image is up to date for gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e
gcr.io/k8s-minikube/kicbase:1.2.3@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e

Without the patch, I suppose we go back to editing the manifest.json afterwards or something ?

To get rid of the legacy domain "index.docker.io" and to replace the practical joke "i-was-a-digest"

pkg/crane/pull.go:const iWasADigestTag = "i-was-a-digest"

https://github.com/google/go-containerregistry/blob/main/pkg/v1/tarball/README.md

@afbjorklund
Copy link
Collaborator Author

afbjorklund commented May 16, 2021

To clarify, we would keep the old patch that saves the tag of the manifest also as the tag of the image.

tarball.Write
google/go-containerregistry@019cdfc

But we would drop the patch that pulls the tag from the registry after loading the image from the tarball.

daemon.Write
google/go-containerregistry@08e9498
google/go-containerregistry@761f6f9

So when pulling from the registry, we would use the name.Digest:

gcr.io/k8s-minikube/kicbase:v0.0.22@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e

But when looking in the daemon/podman, we use the name.Tag:

gcr.io/k8s-minikube/kicbase:v0.0.22


This way we don't have to always contact the registry to get the digest, but it would work offline as well...

The downside would be that it would still match the old v0.0.22 tag, even if there was a new one available.

The way to check this would be to contact the registry and move the tag, but then it doesn't work offline.

docker

docker pull gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e
docker tag gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e gcr.io/k8s-minikube/kicbase:v0.0.22
        "Id": "sha256:bcd131522525c9c3b8695a8d144be8d177bcd5614ec5397f188115d3be0bbc24",
        "RepoTags": [
            "gcr.io/k8s-minikube/kicbase:v0.0.22"
        ],
        "RepoDigests": [
            "gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e"
        ],

podman

sudo podman pull gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e
sudo podman tag gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e gcr.io/k8s-minikube/kicbase:v0.0.22
        "Id": "bcd131522525c9c3b8695a8d144be8d177bcd5614ec5397f188115d3be0bbc24",
        "Digest": "sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e",
        "RepoTags": [
            "gcr.io/k8s-minikube/kicbase:v0.0.22"
        ],
        "RepoDigests": [
            "gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e",
            "gcr.io/k8s-minikube/kicbase@sha256:996fd2ec79bd471af1ca21889c8f61b021b46ff303f89b66f6b153026a9bd4e5"
        ],

Apparently podman keeps both the multi-arch manifest and the image manifest under "RepoDigests".

docker manifest inspect gcr.io/k8s-minikube/kicbase@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 7821,
         "digest": "sha256:9bea9232c23bda2ed17835918bbaf18ae4d22075b9bc5b25085415f33947e268",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 7821,
         "digest": "sha256:996fd2ec79bd471af1ca21889c8f61b021b46ff303f89b66f6b153026a9bd4e5",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      }
   ]
}

And then podman stores the digest as Digest instead, whereas docker only uses RepoDigests.

So with our workaround we would tag both images (amd64 + arm64) with the same original tag.
And then make sure to load the proper one from the proper directory, after separating by arch...

The digest is then only used when creating the cache file, but not when loading or checking.


I would love to see a proposal that at least we tried to verify the integrity and reduce the surface attack for our users

If we wanted to support having multiple images with the same tag, we could always load from file.

It takes longer than just checking the tag, but is still faster than having to pull the image of course.
And since we don't compress the tarball but compress the individual layers instead, it's fast enough ?

$ time docker load -i ~/.minikube/cache/kic/kicbase_v0.0.22@sha256_7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e.tar 
Loaded image: gcr.io/k8s-minikube/kicbase:v0.0.22

real	0m0,408s
user	0m0,079s
sys	0m0,162s
$ time sudo podman load -i ~/.minikube/cache/kic/kicbase_v0.0.22@sha256_7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e.tar 
Getting image source signatures
Copying blob 4437437e0ddd skipped: already exists  
Copying blob 935f303ebf75 skipped: already exists  
Copying blob 0e64bafdc7ee skipped: already exists  
Copying blob 5f322db8c3ee skipped: already exists  
Copying blob edf0b278078b skipped: already exists  
Copying blob 346be19f13b0 skipped: already exists  
Copying blob 195300400d88 skipped: already exists  
Copying blob 85a67d2cbf14 skipped: already exists  
Copying blob 15be91f82cfe skipped: already exists  
Copying blob 560c33e647c6 skipped: already exists  
Copying blob ccb478880bd7 skipped: already exists  
Copying blob f3b95d7abc63 skipped: already exists  
Copying blob 32335a57470c skipped: already exists  
Copying blob cd9f5340c0e5 skipped: already exists  
Copying blob 6632090e389b skipped: already exists  
Copying blob c0939f4dfa94 skipped: already exists  
Copying blob 685a946c7607 skipped: already exists  
Copying blob 2f380b15409b skipped: already exists  
Copying blob 1cfdd8b1b1f9 skipped: already exists  
Copying blob 0e02a61a3a5d skipped: already exists  
Copying blob 911e8812a0d2 skipped: already exists  
Copying blob 53b5f51f3d43 skipped: already exists  
Copying blob c4d31f57d6aa skipped: already exists  
Copying blob fe9dba58efb6 skipped: already exists  
Copying blob b4086dcb79f9 skipped: already exists  
Copying blob e3253bb4d046 skipped: already exists  
Copying blob fd31c13e408f skipped: already exists  
Copying blob 6f4a99f86dce skipped: already exists  
Copying blob f496f2a782bc skipped: already exists  
Copying blob 4dcf3bc4d175 skipped: already exists  
Copying blob b5b279e4fe18 skipped: already exists  
Copying blob 6b29808dbd4d skipped: already exists  
Copying blob b5888a1b71fa skipped: already exists  
Copying blob b66c233fb020 [--------------------------------------] 0.0b / 0.0b
Copying blob 2a3e411f5809 [--------------------------------------] 0.0b / 0.0b
Copying blob 6e9696684979 [--------------------------------------] 0.0b / 0.0b
Copying config bcd1315225 done  
Writing manifest to image destination
Storing signatures
Loaded image(s): gcr.io/k8s-minikube/kicbase:v0.0.22

real	0m6,260s
user	0m6,358s
sys	0m0,178s

So if you wanted to load an image with a different digest, it would move the tag accordingly.
The actual digest is still unknown to docker/podman, then only way to get that is to pull.

REPOSITORY                    TAG       DIGEST    IMAGE ID       CREATED       SIZE
gcr.io/k8s-minikube/kicbase   v0.0.22   <none>    bcd131522525   10 days ago   1.09GB

But we don't actually verify anything here, we just allow multiple images to have the same tag.
It protects us from when the image is re-tagged, but not from when the manifest is modified.

@afbjorklund afbjorklund added the triage/discuss Items for discussion label May 16, 2021
@spowelljr spowelljr added kind/improvement Categorizes issue or PR as related to improving upon a current feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels May 17, 2021
@sharifelgamal
Copy link
Collaborator

We added a feature to failover to trying to pull without the sha, which solves some of these issues.

@medyagh medyagh removed the triage/discuss Items for discussion label Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/docker-driver Issues related to kubernetes in container co/podman-driver podman driver issues kind/improvement Categorizes issue or PR as related to improving upon a current feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

5 participants