Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat req/debate] adding support for image digests with the skopeo relay #86

Closed
nicop311 opened this issue Jan 23, 2023 · 8 comments
Closed

Comments

@nicop311
Copy link

Hello @ALL, I am currently trying to figure out how to add support for image digests copy/sync in dregsy.

I think this is even simpler to implement than the tags based mapping which supports using regular expression to search through tags. Indeed with image digests, there is no need for search with regular expression, since a digest is unique and uniquely identify one particular image.

Image digest are also more robust than tags in certain situation when there is the need to match a very precise version of an image. And image digest are also used for crypto signature of container images.

What do you think about that? I must say, the dregsy source code is not really well documented. But I think I am starting to understand it a bit. I have several ideas as well.

Description of image digests

An image digest looks like this below: using crane

$ crane digest --platform="linux/amd64" library/busybox:1.29.2
sha256:5e8e0509e829bb8f990249135a36e81a3ecbe94294e7a185cc14616e5fad96bd

skopeo & digests

A skopeo copy command using digest rather than tags looks like this :

skopeo copy --preserve-digests docker://docker.io/registry@sha256:cc6393207bf9d3e032c4d9277834c1695117532c9f7e8c64e7b7adcda3a85f39      docker-archive:./registry-linux-amd64-2.8.1.tar:myreg.net/mypath/registry:2.8.1

Important note: Skopeo (1.10.0) DOES NOT support using tags and digest at the same time.

dregsy configuration with digests

Below is an example of what could be a dregsy-config.yaml file which uses digests

# test/fixtures/config/skopeo-digest-valid.yaml
relay: skopeo

lister:
  maxItems: 50
  cacheDuration: 30m

tasks:
- name: test-skopeo-digest-valid
  interval: 30
  verbose: true
  source:
    registry: registry.hub.docker.com
  target:
    registry: 127.0.0.1:5000
    auth: something
    skip-tls-verify: true
  mappings:
  # Tags based mapping
  #~~~~~~~~~~~~~~~~~
  - from: library/busybox
    to: skopeo/library/busybox
    tags: ['1.29.2', '1.29.3', 'latest']
    platform: linux/arm/v6

  # Digests based mapping
  #~~~~~~~~~~~~~~~~~~~~
  - from: library/busybox
    digests:
    - sha256:5e8e0509e829bb8f990249135a36e81a3ecbe94294e7a185cc14616e5fad96bd
    - sha256:76582bc9c59276ea11459bf5ff4f54fd5b7fd23ff622d80479156108fdd26470
    - sha256:6ba3c395f0a2941114d8dcdf80bedcc7e654252f5870dd94daff9cc3188f3eb2
    to: skopeo/library/busybox

  # Using either the WEB GUI at https://hub.docker.com/_/busybox/tags. Or using
  # the tool crane* to get digests from CLI.
  # *crane : https://github.com/google/go-containerregistry/tree/main/cmd/crane
  #
  # $ crane digest --platform="linux/amd64" library/busybox:1.29.2
  # sha256:5e8e0509e829bb8f990249135a36e81a3ecbe94294e7a185cc14616e5fad96bd
  #
  # $ crane digest --platform="linux/arm/v6" library/busybox:1.29.2
  # sha256:6ba3c395f0a2941114d8dcdf80bedcc7e654252f5870dd94daff9cc3188f3eb2
  #
  # $ crane digest --platform="linux/arm64" library/busybox:1.29.3
  # sha256:76582bc9c59276ea11459bf5ff4f54fd5b7fd23ff622d80479156108fdd26470

It is probably better to differentiate and separate "Tags based mapping" and "Digests based mapping", since skopeo does not support both at the same time.

However it might be possible to have a more compact version where both a tag list and a digest list are part of the same mapping object. The skopeo copy requests are different when they use tags or digest (see above the skopeo example with digest).

Also with digest, you can ignore platform:, since the digest already uniquely identify a precise image.

  mappings:
  # Tags based mapping
  #~~~~~~~~~~~~~~~~~
  - from: library/busybox
    to: skopeo/library/busybox
    tags: ['1.29.2', '1.29.3', 'latest']
    platform: linux/arm/v6
    digests:
    - sha256:5e8e0509e829bb8f990249135a36e81a3ecbe94294e7a185cc14616e5fad96bd
    - sha256:76582bc9c59276ea11459bf5ff4f54fd5b7fd23ff622d80479156108fdd26470
    - sha256:6ba3c395f0a2941114d8dcdf80bedcc7e654252f5870dd94daff9cc3188f3eb2

But it might be overkill to dedicate task to either tag only or digest only.

@xelalexv
Copy link
Owner

xelalexv commented Feb 4, 2023

I am currently trying to figure out how to add support for image digests copy/sync in dregsy.

That's an interesting feature that would nicely complement the current functionality.

Indeed with image digests, there is no need for search with regular expression, since a digest is unique and uniquely identify one particular image ...
... It is probably better to differentiate and separate "Tags based mapping" and "Digests based mapping", since skopeo does not support both at the same time.

Keeping the two separate would actually be required. Syncing via digest and syncing via tag list, tag filtering & image matching address two fundamentally different uses cases. Syncing via digest is adequate when we know exactly what to sync beforehand, and that set does not change (at least not frequently). Tag filtering and image mapping however are indispensable when we do not know in advance what we need to sync, for example when we want to continuously sync various image streams produced by image build pipelines. For this we need to describe the set with rules, such as regular expressions or semver filters, and cannot use digests.

@nicop311
Copy link
Author

nicop311 commented Feb 6, 2023

@xelalexv awesome, thank you for your comment.
I have a prototype (on a private repo), which is working relatively good. I am currently creating a public fork, from which I will do a PR.

The behavior of my dregsy fork with support for digests sync is as follow. It does not change the behavior with tags (and all your filters).


0.4.4 dregsy behavior (current)

If the tags field is completely empty in the dregsy configuration file,
this gets all the existing tags on the source registry for a given image.

If the tags field is NOT empty, several filters (regex, semver, etc...) are used to get only the tags matching the filters.

New feature: add support for digest in dregsy (my fork)

I propose 2 new fields in mappings::

  • digests: a list of digests
  • preserve-digests: a skopeo copy command parameter
[...]
  mappings:
  - from: library/busybox
    to: skopeo/library/busybox
    digests:
    - sha256:5e8e0509e829bb8f990249135a36e81a3ecbe94294e7a185cc14616e5fad96bd
    - sha256:76582bc9c59276ea11459bf5ff4f54fd5b7fd23ff622d80479156108fdd26470
    - sha256:6ba3c395f0a2941114d8dcdf80bedcc7e654252f5870dd94daff9cc3188f3eb2
    preserve-digests: true
    tags: ['1.29.2', '1.29.3', 'latest']
    platform: linux/arm/v6

4 cases for the sync() method:

digests list tags list dregsy behavior diff with 0.4.4
empty empty pulls all tags same
empty NOT empty pulls filtered tags only same
NOT empty NOT empty pulls filtered tags AND pulls correct digests different
NOT empty empty pulls correct digests only, ignores tags different
  1. If digests list is empty AND tags list is empty, then Expand tags.
    This is the same behavior than dregsy v0.4.4: copy all available tags.
  2. If digests list is not empty AND tags list is not empty, then Expand tags and get digests.
    Both expand tags & pull the images corresponding the list of digests.
  3. If digests list is not empty AND tags list is empty, then do not Expand tags ; only pull the images corresponding the list of digests.
  4. If digests list is not empty AND tags list is empty, pulls correct digests only, ignore tags.

I also implemented various tests on the digest to detect and ignore wrong digests. Wrong digests are digests that do not exist or that are not correctly formated. This ensure that dregsy will run smoothly, even if digests are incorrect.

@xelalexv
Copy link
Owner

My thoughts about the implementation of this feature in PR #95:

I think we do not need to add the extra digests list to mappings, but can rather handle digests simply within the tags list. When a tag is verbatim, i.e. not a tag filter expression (regex: or semver:), we can parse it and check for the presence of a digest. With a minimal change to the Skopeo relay, a mapping such as this one already works:

mappings:
- from: library/busybox
  to: base-skopeo/digest/busybox
  tags:
  - sha256:acaddd9ed544f7baf3373064064a51250b14cfe3ec604d65765a53da5958e5f5

This will reduce the change set quite a bit. It would be mostly confined to the relays (it should be possible to offer this for the Docker relay as well), and the tag set. In tag set, we need to consider whether a tag is/contains a digest during tag pruning, since digests should never be pruned.

@Nicolas-Peiffer
Copy link

Thank you @xelalexv for reviewing this PR 😄 .

  • handle digests simply within the tags: list: I see what you mean in term of implementation and the amount of changes in the source code.
    However, I find it disturbing and wrong to refer to a digest with the word tags:, since they refer to 2 different mechanisms. I really prefer having tags: and digests: separated. And I don't see a well suited word which could cover both tag & digest, maybe the word reference but this does not mean anything in the container/OCI world.
    I think it is not needed to put the word digest in the image name path: I prefer to: base-skopeo/busybox than to: base-skopeo/digest/busybox.
  • offer this for the Docker relay as well: I (personally) have no particular reason to add the feature to the docker relay. I am not sure what it would require.
  • tag set & since digests should never be pruned: I apologize, I am not sure I understand what you mean.

Let me know if you think this is too much out of the scope of your roadmap and/or coding style.

Remove hardcoded token in configuration samples?

Other question: is this possible to remove the token eyJ1c2VybmFtZSI6ICJhbm9ueW1vdXMiLCAicGFzc3dvcmQiOiAiYW5vbnltb3VzIn0K from the following files?

Indeed I kept this token in my own files test/fixtures/config/skopeo-digest-*, but I am not sure if this is relevant or not.
I will remove them from my files test/fixtures/config/skopeo-digest-*, unless there is a reason for it :-) .

grep -ri "auth:"
[...]
test/fixtures/config/skopeo-valid.yaml:    auth: eyJ1c2VybmFtZSI6ICJhbm9ueW1vdXMiLCAicGFzc3dvcmQiOiAiYW5vbnltb3VzIn0K
test/fixtures/config/docker-valid.yaml:    auth: eyJ1c2VybmFtZSI6ICJhbm9ueW1vdXMiLCAicGFzc3dvcmQiOiAiYW5vbnltb3VzIn0K
[...]

@xelalexv
Copy link
Owner

xelalexv commented Mar 27, 2023

However, I find it disturbing and wrong to refer to a digest with the word tags:, since they refer to 2 different mechanisms. I really prefer having tags: and digests: separated. And I don't see a well suited word which could cover both tag & digest, maybe the word reference but this does not mean anything in the container/OCI world.

At first glance, they may appear very different. But when looking at them a bit closer, we may find that they are more strongly related than we thought. Let's look at their definitions for a start:

From the Docker Registry HTTP API V2 spec:

The core of this design is the concept of a content addressable identifier. It uniquely identifies content by taking a collision-resistant hash of the bytes. ... To disambiguate from other concepts, we call this identifier a digest. A digest is a serialized hash result, consisting of a algorithm and hex portion. The algorithm identifies the methodology used to calculate the digest. The hex portion is the hex-encoded result of the hash.

From OCI Content Descriptors spec:

The digest property of a Descriptor acts as a content identifier, enabling content addressability. It uniquely identifies content by taking a collision-resistant hash of the bytes.

The purpose of digests is to uniquely address content. The digest itself however does not convey anything about what it identifies. For that, a digest needs to be put into context, because digests are used for many parts of container images - layers/blobs, configurations, manifests. As such, the term digest has a very broad meaning.

In the context of retrieving, i.e. pulling images from a registry, the digest we may specify in the image reference is the digest of the image manifest.

Now for tags, quoting from Open Container Initiative Distribution Specification:

Tag: a custom, human-readable manifest identifier

The Docker Registry spec only includes an indirect definition via the manifest fields relevant for pulling an image:

tag The tag for this version of the image.

Now, when we refer to an image, adding a digest or a tag to the image name serves the same purpose from a use case perspective - we specify what particular version of that image we want. The difference is that the digest uniquely addresses that version, while for the tag we rely on the registry to resolve it to the associated version.

So, when we look at our mappings section in a sync task, is it really disturbing to maintain tags and digests in the same list? After all, the tags list is there to specify what versions of the image for the given mapping we want to be included in the sync. This currently allows verbatim tags, regular expressions, semver expressions, and pruning expressions. We would now add digests as a further option. The name tags may not be ideal, but it's also not totally off and a manageable amount of ambivalence. We also regularly talk about files in a directory, even though some of them may be symlinks, or enter host names and IP addresses into input fields labelled IPs, even though IP doesn't mean either one...

But there are also reasons why we would want digests and tags to be maintained in the same collection:

  • With most container runtimes and registries, it is legal to say foo/bar:v1.1.0@sha256:.... In fact, it is quite common to do that, for example for image references in Dockerfiles or Kubernetes manifests. It documents for what tag the digest should stand. While the usefulness of this is debated (e.g. here and here), it is in wide spread use and allowing this in a mapping config would therefore be desirable. This would however not be possible unless we allow tags to be mentioned in the digests list as prefixes to digests.

  • We may want to add a tagging feature for sync by digest, meaning that once synced, the image should be tagged in the target registry. This could for example be indicated by specifying sha256:...@{tag} as the digest. For this, we would again have to allow tags to be mentioned in digests.

Bottom line: I currently see handling digests in the tags list as the adequate choice, both from a use case and implementation perspective.

I think it is not needed to put the word digest in the image name path: I prefer to: base-skopeo/busybox than to: base-skopeo/digest/busybox.

This is just an example, you can use any path you want in to, or completely drop it to use the same path as in the source registry (see image matching).

I (personally) have no particular reason to add the feature to the docker relay. I am not sure what it would require.

And I'm still interested in keeping feature parity between Docker and Skopeo relays, where possible without too much effort. 😄

  • tag set & since digests should never be pruned: I apologize, I am not sure I understand what you mean.

This is explained here.

Other question: is this possible to remove the token eyJ1c2VybmFtZSI6ICJhbm9ueW1vdXMiLCAicGFzc3dvcmQiOiAiYW5vbnltb3VzIn0K from the following files?
Indeed I kept this token in my own files test/fixtures/config/skopeo-digest-*, but I am not sure if this is relevant or not.

You probably copied that from the sample config in the README. Whether your test cases need authentication info depends on whether they do actual sync work. If they do, it would of course have to be meaningful auth info. Seeing that yours are config tests that don't actually intend to do a sync, you would not need that.

@Nicolas-Peiffer
Copy link

Thank you for this detailed answer 👍 🙂 .

I just want to highlight that skopeo handles tags and digests using either the colon : for tags OR the at @ for digest. Indeed skopeo does not currently supports Docker references with both a tag and a digest at the same time.
A couple of example below.

skopeo --version

skopeo version 1.11.1 commit: fb1ade6d9e9b501e35b09538c9533fac5dd604b6

Below is an example of a skopeo tarball copy using syntax with at @<hash_algo>:hex_hash:

skopeo copy --preserve-digests docker://docker.io/busybox@sha256:1dffc7fa199a5156cba6ef34db5f8aaf95d6d8593b907cd392bd813da4a04754     docker-archive:/tmp/docker.io-busybox-1.35.0-glibc-digest-pull.tar:docker.io/busybox:1.35.0-glib
Getting image source signatures
Copying blob 880dcab25a96 done  
Copying config 12b6f68a82 done  
Writing manifest to image destination
Storing signatures

Below is a wrong formatted example of a skopeo tarball copy using syntax with colon :<hash_algo>:hex_hash:

skopeo copy --preserve-digests docker://docker.io/busybox:sha256:1dffc7fa199a5156cba6ef34db5f8aaf95d6d8593b907cd392bd813da4a04754     docker-archive:/tmp/docker.io-busybox-1.35.0-glibc-digest-pull-no-at.tar:docker.io/busybox:1.35.0-glibc
FATA[0000] Invalid source name docker://docker.io/busybox:sha256:1dffc7fa199a5156cba6ef34db5f8aaf95d6d8593b907cd392bd813da4a04754: invalid reference format

Below is a not supported example of a skopeo tarball copy using syntax using both colon and at.

skopeo copy --preserve-digests docker://docker.io/busybox:1.35.0-glibc@sha256:1dffc7fa199a5156cba6ef34db5f8aaf95d6d8593b907cd392bd813da4a04754     docker-archive:/tmp/docker.io-busybox-1.35.0-glibc-tag-and-digest-pull.tar:docker.io/busybox:1.35.0-glibc
FATA[0000] Invalid source name docker://docker.io/busybox:1.35.0-glibc@sha256:1dffc7fa199a5156cba6ef34db5f8aaf95d6d8593b907cd392bd813da4a04754: Docker references with both a tag and digest are currently not supported

Below a correct example with colon and a tag.

skopeo copy --preserve-digests docker://docker.io/busybox:1.35.0-glibc     docker-archive:/tmp/docker.io-busybox-1.35.0-glibc-tag-pull.tar:docker.io/busybox:1.35.0-glibc
Getting image source signatures
Copying blob 880dcab25a96 done  
Copying config 12b6f68a82 done  
Writing manifest to image destination
Storing signatures

Below a wrong example with at and a tag.

skopeo copy --preserve-digests docker://docker.io/[email protected]     docker-archive:/tmp/docker.io-busybox-1.35.0-glibc-tag-pull.tar:docker.io/busybox:1.35.0-glibc
FATA[0000] Invalid source name docker://docker.io/[email protected]: invalid reference format```bash

I understand your point about keeping feature parity between Docker and Skopeo relays, you are absolutely right. I am just saying that I personally can not work on this.


I am not sure Tag Set Pruning can be used with digest. You might need to manually add tags back after you skopeo copy thanks to digest.

But I might miss something.


Conclusion:

For the moment, I can not put more resources into my PR. And it seems that you have a precise idea of how you would implement this feature. Maybe you can use this PR as a source of inspiration, a do your own take at the implementation of the "digest feature", matching your coding style. This way, you can also implement proper tests using your Makefile and docker, which I don't use so it is harder for me.

If you need explanation on some of my source code, I will be happy to answer. But I will not be able to modify the current PR and source code to match what we discussed here about tags and digest.

@xelalexv
Copy link
Owner

xelalexv commented Mar 29, 2023

Indeed skopeo does not currently supports Docker references with both a tag and a digest at the same time.

Yes, that's correct, and can be handled accordingly in the Skopeo relay.

I am not sure Tag Set Pruning can be used with digest. You might need to manually add tags back after you skopeo copy thanks to digest.

Pruning will stay away from digests. Actually, looking into this I realized that I need to adjust pruning slightly such that it stays away from any verbatim tags.

For the moment, I can not put more resources into my PR. And it seems that you have a precise idea of how you would implement this feature. Maybe you can use this PR as a source of inspiration ...

Your issue & PR put the topic on the table in the first place, and our discussion helped in shaping the feature. So, totally worth it. Thanks for all the input! I've created an initial implementation and will push the branch shortly.

xelalexv added a commit that referenced this issue Mar 29, 2023
@xelalexv
Copy link
Owner

I pushed dregsy images for this to DockerHub, for Alpine based image use tags issue86 or issue86-alpine, for Ubuntu based image issue86-ubuntu.

xelalexv added a commit that referenced this issue Mar 31, 2023
xelalexv added a commit that referenced this issue Apr 3, 2023
@xelalexv xelalexv closed this as completed Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants