-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index.json: make org.opencontainers.image.ref.name unique? #581
Comments
Do we have |
On Wed, Feb 22, 2017 at 11:36:15PM -0800, Aleksa Sarai wrote:
While I understand the case where you have multiple
`application/vnd.oci.image.manifest.v1+json` descriptors with
different `platform` entries (though I think that doesn't make sense
in light of the fact that `application/vnd.oci.image.index.v1+json`
has `platform` entries too), it doesn't really make sense to me in
the general case to allow `org.opencontainers.ref.name`s to be
duplicated in `index.json`.
In #533, @stevvooe and @vbatts explicitly discussed multiple entries
with one name but different platform information [1], so you can have:
index.json → platform-appropriate-manifest
With unique names, you'd need:
index.json → multi-platform-index → platform-appropriate-manifest
Of course, without clear platform matching (or other metadata-based
matching) to break ties, things get a bit awkward. Consumers putting
a new platform-agnostic descriptor should have options for “clobber
any pre-existing, platform-agnostic refs” or “add a new
platform-agnostic ref, even if there is a pre-existing,
platform-agnostic ref”. Consumers getting a descriptor can error out
with “you asked for $NAME, but there are multiple descriptors matching
that name with equivalently appropriate platform information”.
To @qianzhangxa's question, here's platform in the index JSON [2], so
you can declare a platform for everything your reference from the
index.
[1]: #533 (comment)
[2]: https://github.com/opencontainers/image-spec/blame/v1.0.0-rc5/image-index.md#L41
|
There should be no validation of annotation data. Either way, the platform dispatch case is why you would want manifests with different names. @cyphar I may be misunderstanding something. Could you provide an example of the data structures you think should be invalid? |
@stevvooe Can you explain how a tool like
When a user asks to do something with the Even if you say "well, that's not defined by the spec" then how is something like this meant to be handled -- which is something I'm quite concerned about because it starts combining the
And even if you have multiple tags that are all unique |
What about we just remove image index from image layout and only have manifest descriptors in |
@qianzhangxa Two levels of indirection give you many benefits -- the most obvious of which is that you can have multiple tags that reference the same multi-architecture image. And the multi-architecture image is part of the CAS, which gives you a lot of other benefits. In my opinion, Not to mention that you've added all of this ambiguity and complicated logic to the reference resolver -- which doesn't make sense from my point of view. It seems to me to be quite a rampant layer violation, and from the The current scheme means that |
@cyphar Can you please elaborate a bit more about "multiple tags that reference the same multi-architecture image."? Did you mean the case that two tags (e.g., |
@qianzhangxa Sorry, I misspoke. What I meant was (and if you look at my example you'll see what I mean) was
Basically, the problem with the current state of |
@cyphar In general, the proposal to validate the contents of annotation field is a non-starter. Annotations are functionally meaningless from the point of the view of the specification. It is up to the consumer decide how to present the data. If the tool doesn't reflect the properties of the data structure, then the tool is broken. umoci can make its own determination about how to handle this case. One approach is to linearize the descriptors that match You'll actually find that the algorithm proposed above is both sufficient and deterministic in resolving multi-platform images, even cases in where duplicate or ambiguous tags are present. |
@stevvooe But we do define what are valid values for annotations. Annotations in the
What is the "standard platform matching algorithm" -- and how am I meant to run said algorithm if But all of this is besides the point -- why are we allowing multiple tags in this instance when you can just create a single tag which points to an
It might be both sufficient and deterministic, but that doesn't make it sensible when you could just drop this ambiguity in the first place. Also, the algorithm you propose is not defined in the spec. So how is a user meant to know how their image will be understood by an |
And to make the point more clear, the current CAS interface for // Engine is an interface that provides methods for accessing and modifying an
// OCI image, namely allowing access to reference descriptors and blobs.
type Engine interface {
// PutReference adds a new reference descriptor blob to the image. This is
// idempotent; a nil error means that "the descriptor is stored at NAME"
// without implying "because of this PutReference() call". ErrClobber is
// returned if there is already a descriptor stored at NAME, but does not
// match the descriptor requested to be stored.
PutReference(ctx context.Context, name string, descriptor ispec.Descriptor) (err error)
// GetReference returns a reference from the image. Returns os.ErrNotExist
// if the name was not found.
GetReference(ctx context.Context, name string) (descriptor ispec.Descriptor, err error)
// DeleteReference removes a reference from the image. This is idempotent;
// a nil error means "the content is not in the store" without implying
// "because of this DeleteReference() call".
DeleteReference(ctx context.Context, name string) (err error)
// ListReferences returns the set of reference names stored in the image.
ListReferences(ctx context.Context) (names []string, err error)
// ... other methods omitted for brevity ... How do you propose that I change the |
@cyphar But why is this a benefit? Won't cause ambiguity? E.g., if there are two same tags in the |
@qianzhangxa It isn't a benefit, my point is that the current system allows this and it shouldn't. That's what this issue is about, removing this ambiguity. |
@cyphar I agree. I guess I misunderstood your previous comment.
|
@qianzhangxa I misspoke in that comment, and I corrected it here. Sorry for any confusion. |
From what I said:
In general, it looks like the interface doesn't represent the underlying data structure. For your interface, it expects unique references. Upon encountering duplicates, it should error out. If you want it to be more flexible, return multiple descriptors for each name and don't error out. This is your choice as an application developer.
It is somewhat implied by the given fields. Look at the platform. If you can process it or run it, do so. They are properties of the targeted thing that try to provide meaning for that thing without traversing further. To be honest, I find this so straightforward that I am not understanding the confusion.
Where are you getting this requirement from? Collect the references and re-process them. Or, use the visitor pattern and pass in a function that does what you need.
Avoid confounding the definition the contents of annotations vs the relationships of annotations. The relationships for the fields of a given type belong with the type, not field of a type, thus annotations' relationships cannot be defined by the annotation itself without creating serious issues.
I think the problem here is that you have constructed an interface based on your expectation of how the data is structured. Now that you have studied the data structure, that interface seems to be insufficient (or, perhaps, may impose requirements). In general, I think this approach to a monolithic tag style interface is broken. You need to approach this more like a tree of references that you visit, collect and process.
This type split is indeed unfortunate. There were never meant to be two descriptor types (it would be like having 16-bit and 64-bit pointers in the same hardware) and complicates things. We could move the platform description to annotations to address this. |
My problem with this is that now you're effectively punting reference resolution to the caller of the CAS, because of this ambiguity and mixing of different restrictions of whether it is valid to return a particular descriptor (such as the Specifications should not intentionally be introducing ambiguity like this, because it makes implementing something to match the specification difficult. "The algorithm is trivial" doesn't really justify the fact that currently there is no unambiguous standard-compliant way of parsing
As above, this effectively means that it is simply not possible for a user of a generic OCI library to be able to know what So now a user has to go find out how |
I agree with @cyphar, currently I am working on OCI image support in Mesos, I need to parse and pull OCI image, basically I am not sure what is the best/standard way to handle the case that there are duplicated |
@cyphar Calling this "punting" or "incompatible" is absolutely ridiculous. You are making up issues where there are none. It is not a free-for-all and the incompatibilities you are speaking to are based on assumption over fact. As I have said, the ability to have the same tag pointing to multiple things is a feature, not a bug. Please avoid the histrionics and actually try the approach I suggested. There are literally two choices here:
Have a little imagination and figure it out. |
On Tue, Feb 28, 2017 at 01:35:10PM -0800, Stephen Day wrote:
There are literally two choices here:
1. Either return multiple descriptors.
2. Error out on duplicates.
Have a little imagination and figure it out.
While this is fine for an API, @cyphar made a good point about compat
issues [1] that you are not addressing here. Say @cyphar's
imagination takes him down (1) for his index API and umoci, but the
Docker devs' imagination takes them down (2). Now an innocent user
builds an image with umoci that has multiple descriptors with the same
org.opencontainers.ref.name and equivalent platform information in
their index JSON. They feed that OCI-compliant image into Docker, and
the OCI-compliant Docker ingestor chokes and dies on the repeated
name. Who's fault is the breakage? The spec is not clear.
If multiple descriptors with the same name/platform are allowed, I
think the spec should either:
a. Unambiguously require some MUST level support for them.
b. SHOULD users away from using them, on the grounds that
OCI-compliant handlers are not required to support them.
My current feeling is that (1) and (b) are the the conservative
courses, but that isn't a healthy ecosystem for promoting the “same
tag pointing to multiple things” feature. If you want this feature to
be portable (and I don't have an opinion on that myself), I'd
recommend the spec do something along the lines of (a). If you don't
want the feature to be portable, then dropping the feature (like #582)
makes the most sense to me.
[1]: #581 (comment)
|
How is this even a concern of the spec? If ref name was important to the spec it wouldn't be in annotations. I thought across the board, annotations are opaque to the specs( runtime and image ) so why would you even encode anything in there for general consumption? Whats the point in having a type safe spec and scheme if you are just going to add things in a generic object? |
@stevvooe I still don't see why this dereference walk:
Is a good feature when this would be possible if
Maybe you can explain to me what feature you get out of the first walk that you don't get in the second one? I'm sure there was a good reason for this, but
I already have a branch in umoci (not pushed yet) with the approach you suggested. I have tried it. I don't agree with it, and that's why I'm trying to have a meaningful discussion on the issue. Sure, I can implement whatever reference resolution algorithm I like and just force users to deal with it. But without also reading how To be clear tools that I've seen (skopeo, umoci and the current state of moby/moby#26369) don't implement platform handling at all (effectively Now this concern we had is now extended into reference resolution, with the additional kick that now reference resolution also has to implement some form of platform handling in order for things to be done automatically -- or we have to (as you suggested) make some parts of the UX require the user to clarify what tag they want. So again, the issue I have with your approach is that
I don't know what your definition of a "free-for-all" is, but in my book the following line from the spec indicates to me that it is indeed a free-for-all:
|
While that's all well and good, users need to be able to reference things inside an OCI image. It's just silly to say that users are on their own if they want to be able to talk about what thing inside an OCI image they are referring to.
We have restrictions on other annotations (since they're JSON strings but the content is meant to be a different type). I'm not sure I understand the argument that we are not allowed to validate or otherwise touch annotations that we define in the spec. |
https://github.com/opencontainers/image-spec/blob/master/annotations.md That makes it clear to me that they are arbitrary fields and values and its up to the consumers to handle the keys/values appropriately. If there is a possibility that users can have duplicate values then just have the consuming code handle that and don't make a naive assumption that the values of the field will be unique. It only states that keys are unique. Starting to add all these rules around "arbitrary" values is a bad position to be in. |
Maybe this seems like a dumb idea and I hope it's ok if I jump in, but
what's the reason that the name couldn't be made first-class?
…On Wed, Mar 1, 2017 at 10:56 AM, Michael Crosby ***@***.***> wrote:
https://github.com/opencontainers/image-spec/blob/master/annotations.md
That makes it clear to me that they are arbitrary fields and values and
its up to the consumers to handle the keys/values appropriately.
If there is a possibility that users can have duplicate values then just
have the consuming code handle that and don't make a naive assumption that
the values of the field will be unique. It only states that keys are
unique. Starting to add all these rules around "arbitrary" values is a bad
position to be in.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#581 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABJ66f5vwhx3MS0HQVGsV9UXKq_7b6Lks5rhb9xgaJpZM4MJo3y>
.
|
This was a mistake made in Docker schema1 and we'd like to avoid repeating history. This really isn't as complicated as @cyphar is making it out to be. The problem is that his code doesn't match the properties of the data structure. This problem is going to come up whether or not we make the |
Per discussion on the OCI call, we are going to follow this up with further implementation notes detailing the degrees of freedom in the datastructure and implications of UX design on compatibility between tools. @cyphar Please open an issue, assigning it to me, for what you are looking for and we'll close this when we have agreed on the body of work. |
Partially related, as per -rc5 (after #561) the following are all valid refs:
If an "implementation notes" doc is in the work, this may get some words covering the processing and compatibility side of using such things as an index. |
As of rc6, the correct annotation key is "org.opencontainers.image.ref.name"! |
When talking to @stevvooe in person basically the main issue is that the restriction will make generation/modification of images more complicated. My counter-point to that is that not doing it makes consumption of images more complicated (and I find the latter issue to be more annoying because it requires more co-ordination between implementations than the former case).
We still haven't got any normative language in the spec for how consumers should handle references. Even a paragraph about "when dereferencing things we recommend you do a pruned walk from the root to find all Manifests and then handle it that way". |
This is no longer the case. The validity of references in the image layout is restricted to This really is a UX problem. In the same way that this particular annotation is not unique, other annotations may not be unique as well. Even if we make this one unique, you'll still have to provide ways to handle duplicates of other kinds of annotations. This is the fundamental problem with annotations. Any attribute used to select a particular component may not result in a unique selection. Ultimately, a manifest index provides a list of descriptors. The UX needs to provide a way to display and select them. Should we put something to that effect in the specification? |
Sorry, I meant "pruned from the
Yes please. |
I am currently working OCI image support in Mesos, basically what I am doing is a full walk (see the patch for details):
I think this is what I can do with the current state of the spec. |
@qianzhangxa I am little confused as to why you have code that involves a remote URI and unpacks an image layout. That doesn't seem like a very efficient use case. |
I assume because distribution is still not a solved problem in OCI land and it's a usecase people need, fetching indexes remotely is quite useful if you have enough metadata to figure out how to fetch individual blobs by digest. Personally I've started kicking around some ideas (based on ACI discovery) in https://github.com/cyphar/parcel. |
Yeah, that algorithm looks about right. In |
@cyphar I'm not really sure why anyone would want to depart from the existing registry model for centralized image storage, other than to create confusion and divide the market. OCI images map directly to the protocol with little effort. I could see some changes around how naming/tagging is handled but that is about it. The problem with parcel (and other wku or dns-based approaches) is that it requires the user to run a service that has well-known urls, ending with many of same complaints that we have today. Such approaches tend to favor service providers and those who sell registry software or solutions over those running small-scale infrastructure. Most of the success I have had in decoupling the protocol has been around opening up client-side configuration. Specifically, through namespace-configuration matching. This places image distribution in the hands of the users and operators, where it really belongs. Yes, this includes both delegating authority and location but it can all be done in the client. I understand the goal of parcel, but the impetus starts with outright falsehoods:
Untrue. You can distribute images with a dumb, static-file registry. Cases where this is not true should be considered bugs. We do this in production and it is fairly straightforward to setup.
Again, not entirely true and this is more of a product of the implementation than anything else. In fact, the protocol can support distributing blobs through bittorrent or other p2p means. I have demonstrated this with both bittorrent and a hand-rolled p2p protocol. The only reason that we haven't supported this is because docker does not store the artifacts unchange. This changes with containerd 1.0 and will be easier to implement.
Again, this is more of a matter of implementation than a limitation of the protocol. By deferring to a client-side mapping of configuration to namespaces or authorities, this problem goes away. While I understand that is easy to make conclusions based on existing implementations, it would be good to understand that actual problems before putting forward a proposal. Most of the problems are limitations of the existing implementation rather than the protocol itself. Adding a new protocol to the mix will be unlikely to help that situation. |
@stevvooe This is not the right place to have this discussion, I was just mentioning what might be a reason for pulling
This is also known as "choice".
It would also be good to not denigrate people for working on an idea as "not understanding the actual problems". Parcel is just an idea I've been working on, it's not a proposal (at least, not yet). If you disagree with the introduction, PRs are welcome... |
@cyphar I think you misread my sentence and probably most of my response and I apologize for that. What I said was that for centralized image storage, the registry is a very good approach. This does not mean that you can't supplement it or distribute images in another way. In no way was I advocating for anything that would limit choice, other than ensuring that the choices available are good ones. Let me rephrase the key part of my response: The actual way forward here is a client-side configuration system that puts the choice in the hands of users and operators, rather than software vendors and service providers. I want true choice. This means the ability to control naming authority and distribution, separately and local to the implementation. This means to be free of the desires of software vendors and service providers while not increasing the number of services you have to run to get things to work (ie wku or dns). I am not sure how this position can be taken to be "supporting a monopoly". It is way less centralized than anything I've seen proposed. Furthermore, no one is saying or has said that you can't play around with extensions to the OCI format. However, I do take issue with the propagation of FUD about the image distribution protocol. I would hope that debunking such FUD with actual facts would not be considered "denigrating". |
I am going to take a look at what you said and revise the introduction as necessary (I don't necessarily agree with everything you said, but I can ask you for clarification out-of-band). However, claiming that I "don't understand the actual problems" because you feel I misrepresented a project you work on is not debunking anything. As for the rest of your message, we can discuss this another time / somewhere else. The |
Please keep the discussions technical and try not to mistake misunderstandings for anything with ill intent. |
ohman, y'all. ... @qianzhangxa that looks about right. |
Sorry for resurrecting an old issue. My own take is the OCI Layout can be treated as an on disk equivalent to a repository, and you can't have two images with the same tag in either. In the case of an Index with two entries that are indistinguishable by the tool consuming it, the first matching entry should be selected per the current index docs:
But I also feel like the tooling that created the |
@sudo-bmitch My understanding of @stevvooe's view is that Given that runtimes and image tools haven't switched to such an interface in the past 7 years, I'm not convinced this design makes much sense from a UAPI perspective. I get it from a pure CS perspective, but it would require reworking how every tool that pulls images works. Yeah, in the unambiguous case it would "just work" but the unambiguous case is where (In retrospect, I do regret how off the rails this issue got. In my defense, I was still a teenager at the time 😅.)
While this does solve the issue by somewhat punting on it, I'm not sure how this would work with nested manifests (which are not generated by anything AFAIK but umoci supports them and implementations SHOULD support them according to the spec). Is "first" depth-first or breadth-first in that case? |
I've implemented this as not so much a depth-first search, which to me implies a recursion to try additional entries if a child manifest doesn't have any matches, but a simple loop that fails fast, the logic looking a bit like: for manifest.mediaType == oci.Index {
descriptor, err := manifest.GetFirstMatch(platform)
if err != nil {
return fmt.Errorf("match not found in index")
}
manifest, err = registry.Get(repo, descriptor)
if err != nil {
return fmt.Errorf("failed to get manifest")
}
} This does mean the search can fail if an Index is created that claims to contain images for the local platform, but then they aren't found. I'm not sure there's enough added value to supporting a more complex implementation that backtracks on errors, particularly since the multi-platform image scenario is probably better solved by using a flat index. With the OCI Layout, I'd like to only see a single descriptor in the index.json for each multi-platform image, pointing to the Index that is the image one would otherwise pull from a registry. E.g. {
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.oci.image.index.v1+json",
"digest": "sha256:b684340140294ef999736ad60ff790389477f87c911ad150ec994eb119be6fe6",
"size": 2275,
"annotations": {
"org.opencontainers.image.ref.name": "scratch"
}
},
{
"mediaType": "application/vnd.oci.image.index.v1+json",
"digest": "sha256:0438daa1d058a178920ea105af06dfc4308cb693b889754409c014923d06535a",
"size": 2435,
"annotations": {
"org.opencontainers.image.ref.name": "alpine"
}
}
]
} Then the search through the index.json would be performed like a tag listing, and the returned image would be handled like any other multi-platform image. The big advantage of that is portability, we aren't recreating the manifest list or index for a multi-platform image when the OCI Layout is used as a transport between two environments. The digest of the content will then match. It also helps the logic in client code easily map between working with an OCI Layout and a registry. Each has a blob store, and a tag listing that returns the manifests. |
While trying to implement
index.json
parsing inumoci
I've come to a bit of a worrying realisation. There isn't really any explanation of how cases whereorg.opencontainers.ref.name
is not unique should be handled. While I understand the case where you have multipleapplication/vnd.oci.image.manifest.v1+json
descriptors with differentplatform
entries (though I think that doesn't make sense in light of the fact thatapplication/vnd.oci.image.index.v1+json
hasplatform
entries too), it doesn't really make sense to me in the general case to alloworg.opencontainers.ref.name
s to be duplicated inindex.json
.So here's two things that would be nice if we could make
org.opencontainers.ref.name
s unique. This includes the multiple-platform
case I mention above because it starts to get a bit hairy -- and honestly you should just be usingapplication/vnd.oci.image.index.v1+json
for that.Sorry to stir this pot again, I've only just hit this issue trying to fix up the
oci/cas
implementation inumoci
.#533
The text was updated successfully, but these errors were encountered: