-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map and Object type problems #261
Comments
@jroper I agree. I get confused by this, too. I also wonder, how someone receiving a CloudEvent in JSON format is supposed to distinguish, if the event data is String or Binary. For Binary I would need to base64 decode, before passing it on in HTTP binary mode. Just deriving this from the content type, might be very error prone. |
ping @clemensv |
@jroper see: #184 (comment) |
We've meanwhile renamed Object to Any, which at least takes some of the conceptual confusion out of the model. The "data" example above is compliant with the JSON mapping, because it falls under the rules of 3.1. which effectively allows free-form JSON. For all other attributes, including their structured contents, it is correct that a plain Boolean isn't supported. We don't have that type. |
@jroper any thoughts on this? |
@jroper ^^ |
I'm having lots of problems in this area trying to decode a JSON structured event so I can forward it as a binary event on another transport:
Details:Spec: "CloudEvents type can be determined by inference using the rules from the mapping table, whereby the only potentially ambiguous JSON data type is string"The ambiguity is fatal. Given a JSON event { ..., "data": "buzzkill } is the data:
There's no safe way to know. Spec: "whether a Base64-encoded string in the data attribute is treated as Binary or as a String is also determined by the contenttype value."Fine if content type starts with "text", but not if it starts "application". My event bridge only needs to know whether or not to base64 decode the data in order to forward a binary event. How does it know that application/zip is binary but application/javascript is text? Does it need a table of every media type? Suggestion:
The originator of the message sets the binary flag - they provide the data so are in a position to know whether it is binary or text. Intermediaries don't need to interpret the contenttype to function. |
@jroper @alanconway is this still an issue? |
I think this is still an issue. So firstly, while I think the best way to deal with this is the same way everything else does. Make data just a single type, by itself it's an opaque blob that only the domain can understand, and when combined with the So, effectively, the type is Note that currently there are two different sources of information for the type of the data, there is the cloud events type information, and there's |
@jroper the spec, at some point, was changed for the definition of maps:
Do we not do this today by saying |
I don't think so, because here is the current definition of
So, booleans, arrays and null can't be expressed in that definition of Any or Map. Or put it this way, I'm not sure what programming languages you're familiar with, but I assume you're familiar with Go. Imagine modelling the Map datatype with Go. It might look something like this (excuse my Go, I'm very new to the language): type CloudEventsAny interface {
typeName() string
}
type CloudEventsMap struct {
map[string]CloudEventsAny
}
func (m CloudEventsMap) typeName() string {
return "Map"
}
type CloudEventsString struct {
string
}
func (s CloudEventsString) typeName() string {
return "String"
}
type CloudEventsNumber struct {
Decimal
}
func (n CloudEventsNumber) typeName() string {
return "Number"
}
type CloudEventsBinary struct {
Buffer
}
func (b CloudEventsBinary) typeName() string {
return "Binary"
} If I've got my go right, the above could be one way to represent the CloudEvents {
"data": {
"foo": true,
"bar": [1, 2, 3]
}
} And you want to parse that data attribute into the above format, so that you're working with it in a way that is agnostic from the wire binding. How do you do it? You can't. It's impossible. Unless you represent the whole JSON as a String or Binary. then it works. But then what is the point of saying that |
See: https://github.com/cloudevents/spec/pull/437/files All of this makes me wonder why we're bothering to say what "Any" is at all. Why not just say it's "any data type" or "unspecified" - and as long as the sender can serialize it in the chosen encoding format then why should the CE spec care? |
That's essentially what I'm saying. It's a blob, opaque to the spec. Implementations can use content type to gather more meaning from it. BUT it is actually more complex than that because then you get to the JSON encoding spec, and it does have some very concrete things to say about its type. And that's a good thing, because if the event is JSON, then it makes sense to have it just as an embedded JSON object, rather than a base64 encoded representation of the JSON. But the most important thing I think is that when you round trip from one encoding and back to the abstract, to another encoding and back, you always end up with the same data. And to do that, I think it needs to be opaque, a blob. The spec as it currently stands you could easily end up with the following payload: {
"data": {
"foo": [1, 2, 3]
}
} Ending up being encoded as: {
"data": "{\"foo\":[1,2,3]}"
} ie, the actual payload has changed, and is now corrupt, and I think this could easily happen with different implementations of different encodings of the current spec after a few round trips between them. So, the specific changes that I would say is that the type of data should be left unspecified, with a note saying many implementations may choose to represent this as a byte buffer, or decode it into higher level formats based on content type. Then, the JSON encoding spec needs to be updated with precise wording on exactly when the data should be a JSON value, or a String, or a base64 encoded binary. Additionally, the JSON encoding needs to be updated to say exactly how the type information it contains should be handled when converting to an abstract representation of teh cloud event, eg, if you have a data that is a String, and no |
@duglin I don't think the type system is the problem. If I understand @jroper correctly, it is applying this type system to the data attribute. I would even go further and state, that data should not be regarded as attribute at all. It becomes one in the JSON format, but even in the JSON format a whole lot of extra rules have been necessary to cover data.
I haven't checked all SDKs, but for the go-sdk you have exactly this:
|
TBH, I don't use the go sdk, I just have something like this in my code:
The extra "context" wrapper is a valid choice, but it is just a choice. But I do agree that it seems like we should just stop trying to over-specify |
+1 for "data is not an attribute, it's a byte string to be interpreted
according to datacontenttype."
De facto it is treated differently from every other attribute in most SDKs,
by definition it is different in all binary encoded transport mappings,
and even the JSON part of the spec has to define special rules for it.
FYI: The cloudevents sdk-go models the data attribute as Go type
interface{} which means it can be any Go data type (interface{} is an
interface that does not require any methods of it's underlying type, so any
type satisfies it - including primitives like int or bool.)
A lot of the time it is of type []byte - binary, and get interpreted
according to datacontenttype. The CE libraries have hard-wired support for
marshalling/unmarshalling Go objects for JSON and XML. The default rules
are useless (JSON numerics turn into floats, XML turns everything into a
string) but Go allows you to provide typed targets (e.g. structs) to coerce
a desired type mapping from a vague source.
All of which to say: make data just a binary blob and let the
implementation deal with it via datacontenttype. There are well known media
types for JSON etc. so we have interop for them without limiting the
possibilities.
…On Thu, Jun 13, 2019 at 9:36 AM Doug Davis ***@***.***> wrote:
TBH, I don't use the go sdk, I just have something like this in my code:
type Event struct {
SpecVersion string `json:"specversion,omitempty"`
Type string `json:"type,omitempty"`
Source string `json:"source,omitempty"`
Subject string `json:"subject,omitempty"`
ID string `json:"id,omitempty"`
Time string `json:"time,omitempty"`
SchemaURL string `json:"schemaurl,omitempty"`
DataContentType string `json:"datacontenttype,omitempty"`
Extensions map[string]string `json:"extensions,omitempty"`
Data json.RawMessage `json:"data,omitempty"`
}
The extra "context" wrapper is a valid choice, but it is just a choice.
But I do agree that it seems like we should just stop trying to
over-specify data. And, if we drop the structured format (only 1/2
joking) we wouldn't need to have data at all - we just define the extra
metadata to add to an already existing event/serialization.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#261?email_source=notifications&email_token=AB3LUXTQZHGDRQ5SFGSMPZLP2JEN7A5CNFSM4FH2L5RKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXTWY5A#issuecomment-501705844>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB3LUXRMDZGRMZBNZSCCG33P2JEN7ANCNFSM4FH2L5RA>
.
|
Anyone want to volunteer to create a PR? |
@jroper do you think you could create a PR with what you're thinking? It would be great to have something to consider for next week's call :-) |
I'll take a stab on Monday. |
Starting to look at this now. First thing I think to do is work out what to call the data, if it's not going to be called an attribute (which I think is a good idea to not call it an attribute). I went through the spec (I did skim over a few of the binding specs and there was nothing too notable there) to work out all the different names that we currently use to refer to the data. Here they are:
Just to be clear, I'm not saying that all of the above need to be changed to all use the same name, sometimes using a different name does make sense, some of the names above are referring are intentionally referring to the data in a slightly different context and the different name is necessary to convey that. But I do think some of the names above need to be consolidated. The challenge with the term data is that of course, everything is data. Each attribute by itself is data, the attributes an aggregate are data, there is data that is not part of the event, etc. So I think this rules out using the term "data" by itself, because it is not precise enough, there's too much chance for ambiguity, and I think we should free up the spec (and bindings etc) to be able to use the data term in a generic sense without the chance of it being confused for the data payload. So one option is to use Data (ie, with a capital D). That already appears in the definitions, and I think it makes it unambiguous. The problem with Data is that it's not actually a proper noun, it's not like we've created a trademark term for this, it is just ordinary data, and we've used the term data because that's what it is. So it doesn't really work grammatically. Also, I don't think it would read that well, it's just a bit odd having capital D data all throughout the spec. That said, I think it works in the definitions section, so I think we should leave that as is, and when referring specifically to the definitions (as the Data Attribute section currently does), we can use Data. I think though the definitions section should define what term will be used to refer to the Data in the rest of the spec. So, I'm torn between two names. The first is "data payload". In most contexts, payload works really well, it's a commonly used term in the context of metadata and data (used in many programmatic APIs as well as protocols). The definition of Data even currently says "(i.e. the payload)". The other is "data content". The reason I like this is because of the naming of the attributes "datacontenttype" and "datacontentencoding". Since they call it content, perhaps the entire spec should call it content. I like the name payload better, but I think content is more consistent. I'll start with payload, and we can bikeshed the names in this issue, PR and other calls until we come to a resolution. |
With this - can we remove the Now that data is not an attribute, the The biggest problem with maps comes from the HTTP encoding - HTTP encoding requires the |
Lots going on here :-)
|
Fixes cloudevents#261 This change causes data to not be thought of as an attribute, rather to be considered an opaque blob described by the content type. The name given is data payload. The wording in the JSON encoding has been modified to reflect this.
Fixes cloudevents#261 This change causes data to not be thought of as an attribute, rather to be considered an opaque blob described by the content type. The name given is data payload. The wording in the JSON encoding has been modified to reflect this. Signed-off-by: James Roper <[email protected]>
Fixes cloudevents#261 This change causes data to not be thought of as an attribute, rather to be considered an opaque blob described by the content type. The name given is data payload. The wording in the JSON encoding has been modified to reflect this. Signed-off-by: James Roper <[email protected]>
@duglin on your type system comment: The point of having it is that you have one reference to constraint. There needs to be a rule for how to express dates that's valid for the one instance we have in the core, but also for all extensions. Otherwise you end up with the CloudEventsTimeAttributeType, because each attribute can be different. Maps are well-defined as JSON objects today. For 'Any': If we allow for polymorphic maps, we arguably need Any for consistency. That said, since Maps don't appear in the abstract, I don't see this being a practical issue. |
Regarding the original objection of the issue: The content of "data" is specifically addressed in the JSON encoding (section 3.3) and any valid JSON is valid for that attribute. If the datacontenttype value is either "application/json" or any media type with a structured +json suffix, the implementation MUST translate the data attribute value into a JSON value, and set the data attribute of the envelope JSON object to this JSON value. |
With #484 merged I believe we can close this - if I'm mistaken please let me know and we can reopen it. |
This is related to #184. Currently the spec introduces a lot of confusion around Maps and Objects when it comes to JSON.
So, the data attribute is an Object, which means it can be Map, String or Binary. Map is a map of name to Object pairs. So, if I have a data attribute of:
It's not a Map. Why? Because
true
is not a valid Object according to the CloudEvenvts spec (a valid object must be a Map, String or Binary), and the CloudEvents spec says that Maps are mappings of Strings to Objects. So the above is actually invalid, if you wanted to pass the above, you would have to encode it like this:Now the data is a String, and so it's a valid CloudEvents Object, and it can be decoded to JSON. However this contradicts the JSON encoding spec, which says that a JSON object can be directly embedded into the Event json, which would effectively make the above event unencodable as soon as you tried to convert it from the JSON encoding to another encoding.
I think there are a number of points in the spec where JSON and the Map type are confused, particularly in the way JSON events are encoded. Either, the CloudEvents spec should support JSON as a top level concept, replacing Map with JSON, or, the Object type should be redefined to be either String or Binary, but not Map, and force JSON to be encoded as a String. If this doesn't happen, the CloudEvents implementations I think will forever be plagued with issues arising from ambiguities of when to treat something as JSON and when to treat it as a String, the result being things like something going in as JSON, but coming out as a String, or vice versa, especially when events go through different encodings, for example, through JSON encoding, then HTTP, then MQTT, etc.
The text was updated successfully, but these errors were encountered: