Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard base image annotations #822

Merged
merged 4 commits into from
Jul 30, 2021
Merged

Conversation

imjasonh
Copy link
Member

Fixes #821 and #783

As discussed in yesterday's call, the next step was to open a PR to nail down specific wording.

@imjasonh
Copy link
Member Author

After talking it over with @jonjohnsonjr, he convinced me that it would probably be better to allow multiple digests and refs.

This enables potential use cases like marking the version of the golang image used in a multi-stage build, so you can signal and automate a rebuild scenario when a newer golang is available. Specific semantics of what constitutes an image another image "derives from" is left meaningfully vague, and up to common interpretation of the annotator and any tooling they depend on.

The latest draft also references a semi-official grammar for what a "reference" is (linking to here), but it'd be great to get that more formalized in distribution-spec instead of in a comment in some Go code. When/if that happens we can update this to point there instead.

Lemme know what you think! 🙏

Copy link
Contributor

@nishakm nishakm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need some finality with this comment I think: #821 (comment)

annotations.md Outdated Show resolved Hide resolved
@samuelkarp
Copy link
Member

Specific semantics of what constitutes an image another image "derives from" is left meaningfully vague, and up to common interpretation of the annotator and any tooling they depend on.

It might be worth putting that in the spec itself. Something like "These annotations MAY contain references or digests of manifests that are shared with an ancestor or references or digests of manifests that are used in producing build artifacts." This might not be the best way to phrase it, but I think having some indication of what the data could be would be worthwhile.

@imjasonh
Copy link
Member Author

imjasonh commented Mar 4, 2021

These annotations MAY contain references or digests of manifests that are shared with an ancestor or references or digests of manifests that are used in producing build artifacts.

Some other versions:

  • "Derived from" MAY mean ancestor images sharing image layers, or images used in producing artifacts in the image.
  • An image MAY be said to be "derived from" another if it shares image layers, or if it was produced using artifacts and processes contained in another image.

Is any of that more/less clear about the intention? I really don't want to be even a little prescriptive about what "derived from" means, because it might end up being very content-dependent and new ideas might show up over time.

@nishakm
Copy link
Contributor

nishakm commented Mar 4, 2021

My concern is that once you add something to the image that says "this image has some relation with these other images" then the question becomes "what is the relation and how do I find out?". Personally, I think this lies in the realm of an SBoM not the image spec.

@SteveLasker
Copy link
Contributor

As discussed on the call, I like the idea of a base image annotation. I'd just suggest we tighten the scope of exactly what it means and how to use it;
For example: It MUST refer to the immediate image this image is based upon. Meaning the traditional docker FROM statement. It SHOULD NOT refer to any other images used to generate the content placed in the image. For instance, multi-step docker files used to fetch and compile source. These are interesting references, but not for this annotation.
The annotation MUST have a single fully qualified reference. Even if it's an internal registry, not accessible to anyone else. If the user wanted to rebuild the image, they need to know where this image was retrieved. If the image is determined to be vulnerable, it should also be known.
Suggest we also capture the tag and current digest of the tag as it was built.

For the annotation to be generally useful, it MUST have a strict definition so it can be used consistently across all uses. How can someone say their tooling around the annotation isn't working because an image isn't using the annotation properly?

@imjasonh
Copy link
Member Author

As discussed on the call, I like the idea of a base image annotation. I'd just suggest we tighten the scope of exactly what it means and how to use it;
For example: It MUST refer to the immediate image this image is based upon. Meaning the traditional docker FROM statement. It SHOULD NOT refer to any other images used to generate the content placed in the image. For instance, multi-step docker files used to fetch and compile source. These are interesting references, but not for this annotation.
The annotation MUST have a single fully qualified reference. Even if it's an internal registry, not accessible to anyone else. If the user wanted to rebuild the image, they need to know where this image was retrieved. If the image is determined to be vulnerable, it should also be known.
Suggest we also capture the tag and current digest of the tag as it was built.

For the annotation to be generally useful, it MUST have a strict definition so it can be used consistently across all uses. How can someone say their tooling around the annotation isn't working because an image isn't using the annotation properly?

Yeah, @SteveLasker I think I'm coming (back 😛 ) around to your viewpoint here.

It'd be great to have annotations that describes all the images, etc., that were used to produce this image, and for whom changes to those images might be interesting and trigger a rebuild/rebase, but for the specific use cases described in #821, we can get by with a simpler more narrowly-scoped annotation. That lets us more narrowly specify what the annotation values should be and mean. It also reduces the complexity of having to describe how two parallel lists of values should interact, which I think is useful.

If folks want a more general-purpose "this image is somehow related to [these images]" annotation, that could always be a separate annotation, or wrapped up into ongoing SBoM work somehow. Either way, that seems like a reasonable separate proposal, with fewer specified guarantees about what the annotation values mean.

If there aren't any strong objections I can change this back to the single-value forms, with more specifics about what the values should be.

@imjasonh
Copy link
Member Author

Updated the PR to reflect the latest language around single base images.

Copy link
Contributor

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can address the feedback for examples, I would approve this narrow scope of base reference.

annotations.md Show resolved Hide resolved
annotations.md Outdated Show resolved Hide resolved
@imjasonh
Copy link
Member Author

This PR was discussed at the March 17 TOB meeting (thanks for brining it up @jonjohnsonjr and sorry again I couldn't attend). I'd like to try to summarize the discussion so not everyone has to watch the video. Basically, there seem to be two camps forming:

  1. There should be one digest and ref described, pointing to the single image this image is based on ("shares zero-indexed layers"), e.g., FROM regist.ry/some/image
  2. The annotation should be able to describe potentially many "base images", e.g., a golang image used to build me, an assets image I pulled some files from at build-time, etc., which might also include the image I'm based on by the definition assumed in (1), but also might not.

To be clear, I absolutely want both, and I think there's value in being able to easily signal "an update to golang should insinuate I need an update", or whatever the semantics of that version of the annotation mean.

I think the argument for (1) is that it's a smaller, simpler, focused change to make to the spec, and can be defined more narrowly and easily. Basically every image has a single "base image" under that definition of a "base image" (i.e., "shares zero-indexed layers") -- if it's FROM scratch; COPY /my/app . then, well, it doesn't have a base image and shouldn't use the annotation. 🤷

(2) is much more powerful and flexible, but as a result can make it hard to consume effectively. Tooling that can understand multiple "bases" (I think this term is bad when used this way) can have potentially many different actions to take when those bases change. When golang updates, what should my tooling say I need to do with my image that's based on it? More additional signal is required, from somewhere.

From a practical standpoint, (2) requires us to define "base image" very vaguely to include supporting images in a multi-stage build for instance, and I don't think that's a way the term is used in common practice. We could change the annotation to org.opencontainers.image.somethingelse.digest*s* and .tag*s* but we've already painted that bikeshed and seem generally happy with the color.

Again, to be clear, I definitely want (2) to exist, and for systems built on (2) to be defined and built, I'm just not sure I want to be the one to define them. 😄 I just want (1) to exist and to build tooling that consumes it.

cc @SteveLasker @jonjohnsonjr @samuelkarp @vbatts who were involved in the discussion and can fact-check my summary

@jonjohnsonjr
Copy link
Contributor

I think there's two orthogonal issues:

  1. Plurality
  2. Specificity

For the first, I'm okay with only allowing a single "base image" per annotation. I can hack around this.

For the second, I'm less okay with defining exactly what a "base image" means. I'm fine with us referencing the common usage of "base image" as referring to the contiguous zero-indexed layers (via MAY language), but I think this being a SHOULD is a little strong. It's encoding a lot of docker-specific implementation details that aren't really required for the usefulness of this annotation. It's also trivial to check whether or not the base image does conform to that pattern, so there is no loss of information. Any use case that only applies to Dockerfile-style base images can do a very simple check before performing the operation. In fact, we might want to add language that tools SHOULD verify that the base image matches their expectations before taking any action.

Copy link
Contributor

@SteveLasker SteveLasker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general framing I have around annotations are:

  • is the value intended for humans to read/parse and possibly do something
  • is the value intended for computers to parse, which should be very specific. We shouldn't need an AI/ML process to iterate all the possible permutations to understand the intent.

If the value is for humans, it's less important, because humans are very good about inferring meaning, and there's no automation. If it's for automation, we should be very specific and actionable for public software and any user to automate around.

If it's for specific companies then Acme Rockets can create their own annotations and use them any way they'd like.
Putting the grammar info in the ref-name seems less important than an actual example, as the example implies whether this is intended to be a fully qualified registry name (docker.io/library/ubuntu), a default docker hub reference (ubuntu - please don't as we need to kill the default docker hub concept), or a human value that says ubuntu but is only interesting for the reader to think, oh, that's the ubuntu image, but I don't know where exactly they got it from, or where I monitor if it's still this digest.

So, how do we expect users to use these annotations - automation or human interesting, but not necessarily ready for automation?

@imjasonh
Copy link
Member Author

Putting the grammar info in the ref-name seems less important than an actual example, as the example implies whether this is intended to be a fully qualified registry name (docker.io/library/ubuntu), a default docker hub reference (ubuntu - please don't as we need to kill the default docker hub concept), or a human value that says ubuntu but is only interesting for the reader to think, oh, that's the ubuntu image, but I don't know where exactly they got it from, or where I monitor if it's still this digest.

If this example is insufficient:

* This SHOULD be a fully qualified reference name, without any assumed default registry. (e.g., `index.docker.io/some/image` instead of `some/image`).

...could you suggest one that you'd find more clear?

So, how do we expect users to use these annotations - automation or human interesting, but not necessarily ready for automation?

I'd like to aim to use these annotations for automation, and have these annotations targeted at computers.

From the initial proposal, use cases include, in order of increasing complexity/utility:

  1. a registry UI showing a little badge or something to signal "FYI: this image is based on an out-of-date base image"
  2. a registry proactively notifying users their images are out-of-date (email, SMS, self-addressed stamped envelope)
  3. an automated system automatically rebuilding/rebasing images when it detects an out-of-date base image

(bonus points in all cases if the old base has a CVE that the new base doesn't)

Another annotation, or series of annotations (or maybe references?) could be useful to signal even more complex dependency trees/graphs that could signal any number of a variety of out-of-date dependencies scenarios, but I still think there's value in this annotation being narrowly scoped to one common, narrow definition of "an image you might care about being out-of-date"

@imjasonh
Copy link
Member Author

Correct me if I'm wrong, but in this week's meeting, it almost sounded like we were approaching rough consensus about this. (The recording's not up yet, you'll have to take my word for it 😆 )

If I can summarize the final sticking points:

  1. some people don't like that it's SHOULD and not MAY, but I think at least not being a MUST leaves everyone wiggle room that (I think) people were ultimately comfortable with. I'm not the annotation police, I can't stop you from putting anything you want in your own images. 😄
  2. some people don't like that this kinda sorta defines "base image" to mean "shares zero-indexed layers" but:
    • AFAIK there's no better term for what this does intend to describe than "base image"
    • loosening the definition loses useful specificity about its intention, and how it should be used (see (1))

I don't know what the process is for calling for a vote of "close enough", but in the absence of any better ideas, and in the presence of what seems like mostly positive reception, I'd like to hereby call for a vote of "close enough", and kindly ask for approval. 🙏

@samuelkarp
Copy link
Member

I don't know what the process is for calling for a vote of "close enough", but in the absence of any better ideas, and in the presence of what seems like mostly positive reception, I'd like to hereby call for a vote of "close enough", and kindly ask for approval.

While the calls and feedback on this issue from community members is valuable, ultimately that decision falls to the maintainers of the image spec.

@sudo-bmitch
Copy link
Contributor

LGTM. For those looking to write different tooling with different goals, there's always the option to add their own annotation. This optional annotation covers a large number of users and use cases that makes sense to provide a standard.

@imjasonh
Copy link
Member Author

imjasonh commented May 3, 2021

FYI I'm still hoping to get this approved and merged, it's mostly in a holding pattern while opencontainers/tob#95 is being resolved and new maintainers identified and sworn in.

My understanding is this will still only need 2+ approvers from among those maintainers once they've been identified, someone please let me know if that's not the case.

Copy link
Contributor

@jonjohnsonjr jonjohnsonjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjasonh
Copy link
Member Author

imjasonh commented Jun 30, 2021

Based on discussion with @jonjohnsonjr I've added another point describing how the annotations should be used when the base image is an image index (i.e., a multi-platform image like ubuntu). In this case, both the image.base.ref.name and image.base.digest should point to the index's manifest ref and digest, not the matching platform-specific image digest.

This is important because tooling that wants to detect and perform rebuilds/rebases needs to be able to detect whether the thing referenced by tag and digest has diverged, and the only way to do that is to have them point to the same thing. In the case of multi-platform images, the platform-specific image within the index (ubuntu for linux/arm64 or whatever) might not even have a tagged reference, and even if it did there's no way to tell what it is if you're only given the multi-platform ubuntu reference, so that's the only one you can annotate. If the image.base.digest doesn't point to the multi-platform ubuntu index manifest, the tag and digest will always diverge and your tooling will always think it should perform a rebuild or rebase.

annotations.md Outdated Show resolved Hide resolved
@imjasonh imjasonh dismissed stale reviews from sokarax and SteveLasker via ebb32fd July 12, 2021 17:39
@imjasonh
Copy link
Member Author

Friendly ping for more reviews and approvals from image-spec maintainers 🤞

@vbatts @stevvooe @jonjohnsonjr

Copy link
Contributor

@stevvooe stevvooe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nishakm
Copy link
Contributor

nishakm commented Jul 28, 2021

I'm not a maintainer but this has my LGTM :)

@vbatts vbatts merged commit 8e42a01 into opencontainers:main Jul 30, 2021
vrothberg added a commit to vrothberg/buildah that referenced this pull request Aug 3, 2021
Since opencontainers/image-spec/pull/822/ the OCI spec supports two new
annotations to set the fully-qualified name and the digest of the base
image.

Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/buildah that referenced this pull request Aug 3, 2021
Since opencontainers/image-spec/pull/822/ the OCI spec supports two new
annotations to set the fully-qualified name and the digest of the base
image.

Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/buildah that referenced this pull request Aug 3, 2021
Since opencontainers/image-spec/pull/822/ the OCI spec supports two new
annotations to set the fully-qualified name and the digest of the base
image.

Signed-off-by: Valentin Rothberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Standard Base Image Annotations