-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to determine if data is binary? #520
Comments
The latest CE spec has removed ContentEncoding from the Event (which finally I agree with) and base64 encoding is only of concern for JSON-encoded events. There's an extra attribute in the latest JSON format spec to deal with this. For other bindings we must disambiguate the two uses of the word "binary" in the CE spec. A binary mode message is a style of event message used by some bindings - it has nothing whatsoever to do with base64 encoding. The phrase "binary event data" doesn't mean anything for an abstract Event. The abstract Data attribute is either a CE type or a byte sequence described by a MIME type. How CE types and byte sequences are represented concretely is determined by the format/binding specs. Again JSON is the only one that uses base64 and the json format spec deals with that. So a HTTP message: If the ContentType = "application/cloudevents+X" then you have a structured message. You are done with the HTTP decoding, and should pass the HTTP bytes body directly to an X event decoder. X is usually json, the json format spec says how to deal with base64. If the ConentType=X where X doesn't start with "application/cloudevents+"", then you should handle the HTTP body directly as a byte sequence of MIME type X. The definition of the MIME type is all you need. For example if ContentType="application/json" it's a JSON value. JSON doesn't distinguish string and binary types, so it is a bad choice if you want to send binary data - use "application/octet-stream" instead. If the content-type is missing then the HTTP spec should say what to do but frankly that's just a Bad Idea. Use a real MIME type if you want any hope of interoperability. There are 2 perfectly legal ways to encode a JSON event with base64 data in HTTP:
I would do 1. as less likely to cause interop headaches. I think 7-bit modems are behind us. AMQP could do the same in principle (it has a content-encoding message property) but I definitely wouldn't do that - AMQP apps are likely to ignore it and assume normal byte stream in the body. I want to start a CE implementation called CIJ: "CloudEvents Isn't JSON" |
Let's assume intermediary needs to create an event in JSON format from this, e.g. to forward the event via MQTT 3.1. How does it determine from any of the MIME types if the body goes into data or needs to be base64 encoded and go into data_base64? |
If you're going to JSON then it is always legal and safe to base64 encode the data, the receiver will base64 decode. For a more friendly encoding you can examine the MIME type - if it is one that is known to be "text safe", e.g. "text/...", "application/json,xml..." then you can put it unencoded in the data attribute. If its anything you aren't sure of, you have to base64 encode to be safe. I couldn't find any "official" list of textish MIME types, here's what I used in a previous project:
|
Yes, I was expecting some mixture of whitelist and heuristic. I just wanted to be sure that this understanding is correct. It basically means that there might always be edge cases where SDKs will behave differently. |
On Thu, Oct 3, 2019 at 9:34 AM Klaus Deißner ***@***.***> wrote:
Yes, I was expecting some mixture of whitelist and heuristic. I just
wanted to be sure that this understanding is correct. It basically means
that there might always be edge cases where SDKs will behave differently.
It's the best I could come up with, it's a while back but I don't think
anything has changed that would invalidate that approach.
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#520?email_source=notifications&email_token=AB3LUXU5UJHULGYSX3JSVR3QMXYHFA5CNFSM4I4W3472YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAIG7VI#issuecomment-537948117>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB3LUXXDM24QXBUMBWINYDLQMXYHFANCNFSM4I4W347Q>
.
|
Took the bull by the horns and wrote a proposed clarification to the spec. I'm pretty happy with this interpretation (no more mention of CE types!) hopefully others are too. |
Clarify the meaning of missing datacontentype, when it is allowed, and what that means when translating events between formats/protocols. Also replaced "transport" with "protocol" everywhere, I believe that is now the preferred term. "Transport" is not defined in the terminology. Fixes cloudevents#520
Clarify the meaning of missing datacontentype, when it is allowed, and what that means when translating events between formats/protocols. Also replaced "transport" with "protocol" everywhere, I believe that is now the preferred term. "Transport" is not defined in the terminology. Fixes cloudevents#520 Signed-off-by: Alan Conway <[email protected]>
Clarify the meaning of missing datacontentype, when it is allowed, and what that means when translating events between formats/protocols. Also replaced "transport" with "protocol" everywhere, I believe that is now the preferred term. "Transport" is not defined in the terminology. Fixes cloudevents#520 Signed-off-by: Alan Conway <[email protected]>
Clarify the meaning of missing datacontentype, when it is allowed, and what that means when translating events between formats/protocols. Also replaced "transport" with "protocol" everywhere, I believe that is now the preferred term. "Transport" is not defined in the terminology. Fixes cloudevents#520 Signed-off-by: Alan Conway <[email protected]>
MQTT 3.1 doesn't have support for content type negotiation, so for that protocol the rule is simply that the consumer needs to understand what's coming by some out of band convention. Sometimes that's done via the topic name. Generally, all entity bodies and payloads are opaque and to be forwarded as-is until they hit a consumer that can interpret the media type. ASCII is only text once you start looking at a byte sequence that way. Whether you ought to look at the byte sequence that way is indicated by the content type. |
@clemensv We actually have this case that events are produced in MQTT and later consumed in AMQP. There we use exactly that out-of-band convention you mentioned. |
+1 |
Clarify the meaning of missing datacontentype, when it is allowed, and what that means when translating events between formats/protocols. Also replaced "transport" with "protocol" everywhere, I believe that is now the preferred term. "Transport" is not defined in the terminology. Fixes cloudevents#520 Signed-off-by: Alan Conway <[email protected]>
Clarify the meaning of missing datacontentype, when it is allowed, and what that means when translating events between formats/protocols. Also replaced "transport" with "protocol" everywhere, I believe that is now the preferred term. "Transport" is not defined in the terminology. Fixes cloudevents#520 Signed-off-by: Alan Conway <[email protected]>
Several event formats now contain a section about the handling of data, which specifies that implementations need to determine if data is binary. In #371 @alanconway pointed out that MIME types do not provide this information.
If an event bridge receives events in HTTP binary mode and is supposed to convert them to JSON format, how does it determine in a reproducible way, if it needs to base64 encode the data?
So what would implementations do?
The text was updated successfully, but these errors were encountered: