Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to determine if data is binary? #520

Closed
deissnerk opened this issue Oct 2, 2019 · 9 comments · Fixed by #521
Closed

How to determine if data is binary? #520

deissnerk opened this issue Oct 2, 2019 · 9 comments · Fixed by #521
Labels

Comments

@deissnerk
Copy link
Contributor

Several event formats now contain a section about the handling of data, which specifies that implementations need to determine if data is binary. In #371 @alanconway pointed out that MIME types do not provide this information.
If an event bridge receives events in HTTP binary mode and is supposed to convert them to JSON format, how does it determine in a reproducible way, if it needs to base64 encode the data?
So what would implementations do?

@alanconway
Copy link
Contributor

The latest CE spec has removed ContentEncoding from the Event (which finally I agree with) and base64 encoding is only of concern for JSON-encoded events. There's an extra attribute in the latest JSON format spec to deal with this.

For other bindings we must disambiguate the two uses of the word "binary" in the CE spec. A binary mode message is a style of event message used by some bindings - it has nothing whatsoever to do with base64 encoding.

The phrase "binary event data" doesn't mean anything for an abstract Event. The abstract Data attribute is either a CE type or a byte sequence described by a MIME type. How CE types and byte sequences are represented concretely is determined by the format/binding specs. Again JSON is the only one that uses base64 and the json format spec deals with that.

So a HTTP message: If the ContentType = "application/cloudevents+X" then you have a structured message. You are done with the HTTP decoding, and should pass the HTTP bytes body directly to an X event decoder. X is usually json, the json format spec says how to deal with base64.

If the ConentType=X where X doesn't start with "application/cloudevents+"", then you should handle the HTTP body directly as a byte sequence of MIME type X. The definition of the MIME type is all you need. For example if ContentType="application/json" it's a JSON value. JSON doesn't distinguish string and binary types, so it is a bad choice if you want to send binary data - use "application/octet-stream" instead.

If the content-type is missing then the HTTP spec should say what to do but frankly that's just a Bad Idea. Use a real MIME type if you want any hope of interoperability.

There are 2 perfectly legal ways to encode a JSON event with base64 data in HTTP:

  1. ContenType="application/octet-stream"; ConentEncoding="base64"; Body:
  2. ContenType="application/octet-stream"; Body:

I would do 1. as less likely to cause interop headaches. I think 7-bit modems are behind us.

AMQP could do the same in principle (it has a content-encoding message property) but I definitely wouldn't do that - AMQP apps are likely to ignore it and assume normal byte stream in the body.

I want to start a CE implementation called CIJ: "CloudEvents Isn't JSON"
JSON is important but we need to separate it's foibles and weaknesses from CE if we want CE to take off.

@deissnerk
Copy link
Contributor Author

If the ConentType=X where X doesn't start with "application/cloudevents+"", then you should handle the HTTP body directly as a byte sequence of MIME type X. The definition of the MIME type is all you need.

Let's assume intermediary needs to create an event in JSON format from this, e.g. to forward the event via MQTT 3.1. How does it determine from any of the MIME types if the body goes into data or needs to be base64 encoded and go into data_base64?

@alanconway
Copy link
Contributor

alanconway commented Oct 3, 2019

If you're going to JSON then it is always legal and safe to base64 encode the data, the receiver will base64 decode.

For a more friendly encoding you can examine the MIME type - if it is one that is known to be "text safe", e.g. "text/...", "application/json,xml..." then you can put it unencoded in the data attribute. If its anything you aren't sure of, you have to base64 encode to be safe. I couldn't find any "official" list of textish MIME types, here's what I used in a previous project:

var textish = *regexp.MustCompile("(^$)|(^text$)|(^text/)|(^application.*[/+-.](xml|json|yaml|javascript|html|text|wbxml)([/+-.]|$))")

@deissnerk
Copy link
Contributor Author

Yes, I was expecting some mixture of whitelist and heuristic. I just wanted to be sure that this understanding is correct. It basically means that there might always be edge cases where SDKs will behave differently.

@alanconway
Copy link
Contributor

alanconway commented Oct 3, 2019 via email

@alanconway
Copy link
Contributor

Took the bull by the horns and wrote a proposed clarification to the spec. I'm pretty happy with this interpretation (no more mention of CE types!) hopefully others are too.

alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Oct 7, 2019
Clarify the meaning of missing datacontentype, when it is allowed, and
what that means when translating events between formats/protocols.

Also replaced "transport" with "protocol" everywhere, I believe that
is now the preferred term. "Transport" is not defined in the terminology.

Fixes cloudevents#520
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Oct 7, 2019
Clarify the meaning of missing datacontentype, when it is allowed, and
what that means when translating events between formats/protocols.

Also replaced "transport" with "protocol" everywhere, I believe that
is now the preferred term. "Transport" is not defined in the terminology.

Fixes cloudevents#520

Signed-off-by: Alan Conway <[email protected]>
@duglin duglin added the v1.0 label Oct 7, 2019
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Oct 8, 2019
Clarify the meaning of missing datacontentype, when it is allowed, and
what that means when translating events between formats/protocols.

Also replaced "transport" with "protocol" everywhere, I believe that
is now the preferred term. "Transport" is not defined in the terminology.

Fixes cloudevents#520

Signed-off-by: Alan Conway <[email protected]>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Oct 8, 2019
Clarify the meaning of missing datacontentype, when it is allowed, and
what that means when translating events between formats/protocols.

Also replaced "transport" with "protocol" everywhere, I believe that
is now the preferred term. "Transport" is not defined in the terminology.

Fixes cloudevents#520

Signed-off-by: Alan Conway <[email protected]>
@clemensv
Copy link
Contributor

clemensv commented Oct 8, 2019

MQTT 3.1 doesn't have support for content type negotiation, so for that protocol the rule is simply that the consumer needs to understand what's coming by some out of band convention. Sometimes that's done via the topic name.

Generally, all entity bodies and payloads are opaque and to be forwarded as-is until they hit a consumer that can interpret the media type.

ASCII is only text once you start looking at a byte sequence that way. Whether you ought to look at the byte sequence that way is indicated by the content type.

@deissnerk
Copy link
Contributor Author

@clemensv We actually have this case that events are produced in MQTT and later consumed in AMQP. There we use exactly that out-of-band convention you mentioned.

@alanconway
Copy link
Contributor

+1
MQTT 3.1 cloudevents mapping is always JSON structured, so for CE over MQTT 3.1 the out of band convention is "expect a JSON-format cloudevent". CE over later MQTT versions look more like AMQP with binary/structured mode choices etc.

alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Oct 9, 2019
Clarify the meaning of missing datacontentype, when it is allowed, and
what that means when translating events between formats/protocols.

Also replaced "transport" with "protocol" everywhere, I believe that
is now the preferred term. "Transport" is not defined in the terminology.

Fixes cloudevents#520

Signed-off-by: Alan Conway <[email protected]>
alanconway added a commit to alanconway/cloudevents-spec that referenced this issue Oct 10, 2019
Clarify the meaning of missing datacontentype, when it is allowed, and
what that means when translating events between formats/protocols.

Also replaced "transport" with "protocol" everywhere, I believe that
is now the preferred term. "Transport" is not defined in the terminology.

Fixes cloudevents#520

Signed-off-by: Alan Conway <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants