-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP Transport Binding for batching JSON #370
Conversation
travis isn't happy :-( I wonder if it's the case of the "B" |
6a6196a
to
2cb54a1
Compare
Yep, that was it. Thanks, fixed! :) |
Signed-off-by: Christoph Neijenhuis <[email protected]>
2cb54a1
to
2d460c2
Compare
Signed-off-by: Christoph Neijenhuis <[email protected]>
2d460c2
to
d7216e2
Compare
Signed-off-by: Christoph Neijenhuis <[email protected]>
3012413
to
a1717e5
Compare
LGTM Thoughts? |
@deissnerk @cneijenhuis batching is about structuring the message payload (one or multiple). The WebHook spec is about how you deliver the payload, independent of how it's structured.
|
http-transport-binding.md
Outdated
|
||
In the *batched* content mode several events are batched into a single HTTP | ||
request or response body. The chosen [event format](#14-event-formats) MUST | ||
define how a batch is represented. Currently, the only format supporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to change "currently, the only" if/when that changes. It makes sense to make an absolute statement like "The JSON event format that MUST be supported by any compliant implementation supports batching."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't implement your suggestion exactly, because it seems to imply that batching MUST be supported by any compliant implementation. In the current proposal, it is however optional, and I think it should stay optional, as I wrote in the PR description:
I also did not include the batching mode as one of the modes the receiver SHOULD support. Take a Function as a Service: Usually, each event should be processed in on an own instance of a function. This is trivial to implement when a HTTP Request always contains a single event. A batch doesn't map nicely onto the core FaaS abstractions.
Is what I added in 1191dd2 fine?
http-transport-binding.md
Outdated
The batch of events is then rendered in accordance with the event format | ||
specification and the resulting data becomes the HTTP message body. | ||
|
||
The batch MAY be empty (typically used in a HTTP response). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strike "(typically used in a HTTP response)". There might be reasons to push an empty (zero elements) batch.
json-format.md
Outdated
This section defines how a batch of CloudEvents is mapped to JSON. | ||
|
||
The outermost JSON element is a [JSON Array][JSON-array], which contains | ||
CloudEvents rendered in accordance with the [JSON event format][JSON-format] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... as elements ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach with the use of a JSON array, however I'd like to see an alternate JSON event format that supports a 2D array.
Time series events are most efficiently contained within a grid (tabular) structure which can be represented by a two-dimensional "array" as an "object". A CloudEvents "message-level" Batch object can comprise this 2D array.
The column values within the 2D array can correspond to CloudEvent "event-level" attributes (e.g., id, time, type, data), which can include attributes from one or more extensions (e.g., sequence, object, attribute).
Another "message-level" Schema attribute (type: array as object) can map the column indices to the names of "event-level" attributes:
0, "time"
1, "type"
2, "object"
3, "attribute"
4, "data"
The Schema array could also include constraint columns (e.g. Required) allowing flexibility among producer/consumer ecosystems.
This design has 2 benefits:
-
It compresses a multi-event payload by removing repetitive "names" in name (attribute)-value pairs.
-
It streamlines a consumer's parsing a multi-event payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are proposing a different data layout model. That's not a batch, it's a wholly different encoding. I discuss this here as "Record Sequence with Metadata Preamble"
https://vasters.com/blog/data-encodings-and-layout/
If you care about compact time series data transfers, we should make an Apache Avro encoding, because Avro is really good at that.
Signed-off-by: Christoph Neijenhuis <[email protected]>
@clemensv I understand that distinction between transport and web hook. That's why I was suggesting to define the header fields separate from the core web hook spec. Technically I think it makes sense, that the delivery target can specify the (maximum) number of events each message is supposed to contain in addition to the request rate, |
I do agree that there is an interesting interaction between request rate and batch size. I even wonder if specifying both statically is useful. E.g. if the request rate is 50/minute, and the batch size is 50:
Maybe it is better to interpret the request rate as event rate. E.g. for a request rate of 10, you can send me 10 events in 10 requests, or you can send me 10 events in a single batch.
My take is that there is an initial registration step (which is not standardized), and - if the sender offers a choice - the delivery target (administrator) should choose the format and mode. |
http-transport-binding.md
Outdated
In the *batched* content mode several events are batched into a single HTTP | ||
request or response body using an [event format](#14-event-formats) that | ||
supports batching. | ||
|
||
### 1.4. Event Formats | ||
|
||
Event formats, used with the *structured* content mode, define how an event is | ||
expressed in a particular data format. All implementations of this | ||
specification MUST support the [JSON event format][JSON-format], but MAY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to make this clearer w.r.t. which json format we mean? single vs batched - or do they have to support both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cneijenhuis what do you think about making this:
... MUST support the non-batching [JSON event format][JSON-format], but MAY...
so just add non-batching
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it! d3f2edb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
http-transport-binding.md
Outdated
|
||
The batch MAY be empty. | ||
All batched CloudEvents MUST have the same `specversion` attribute. Other | ||
attributes MAY differ, including the `contenttype` attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datacontenttype
Although the *JSON Batch Format* builds ontop of the *JSON Format*, it is | ||
considered as a separate format: a valid implementation of the *JSON Format* | ||
doesn't need to support it. The *JSON Batch Format* MUST NOT be used when only | ||
support for the *JSON Format* is indicated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this last sentence is appropriate since we don't get into how things like this are "indicated".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with dropping it - the sentence tries to re-inforce/clarify the difference between the two formats, but the point has already been made in the sentence before.
json-format.md
Outdated
"comexampleextension2" : { | ||
"otherValue": 5 | ||
}, | ||
"contenttype" : "application/vnd.apache.thrift.binary", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datacontenttype
json-format.md
Outdated
"comexampleextension2" : { | ||
"otherValue": 5 | ||
}, | ||
"contenttype" : "application/json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dct
I have an implementation of this here now: https://github.com/clemensv/cloudevents-sdk-csharp/tree/batching The only little pitfall I've found is that a |
Signed-off-by: Christoph Neijenhuis <[email protected]>
Signed-off-by: Christoph Neijenhuis <[email protected]>
This PR was approved on last week's call with the agreement to make the minor editorial changes that @cneijenhuis has now made. So, LGTM from me, however, I'd like more one LGTM just to make sure one more set of eyes has verified that the latest round of edits didn't introduce some minor bug by mistake. |
@dmigliori we discussed your comment ( #370 (comment) ) on last week's call. The general consensus was that your proposal is independent of the "batching" feature. Would you be willing to open an issue or even better, a PR, so the group could discuss your proposal as a stand-alone feature? |
My original intention was that with a maximum batch size you could limit memory usage on the receiver side. I agree that the overall event rate might be interesting, too.
Fair enough. |
Ah, I see. Yes, both limits make sense in their own right. |
@cneijenhuis @deissnerk anything we need to modify here before we merge? |
@duglin No, I don't think so. This would be a separate issue or PR. I don't really know, where to put it. We wanted to keep the web hook spec CloudEvent agnostic. |
@deissnerk ok thanks. Anyone wanna give it one more look-over and an LGTM? |
@duglin LGTM |
@deissnerk thanks! merging |
In the discussion from last week about #360 we agreed that a batching option for the HTTP Transport Binding is useful. This is an attempt to introduce it.
During the call, @clemensv suggested to build the batching into the structured mode. I couldn't figure out how to do this generically for any event format. Instead, I did it for the JSON format only.
I also did not include the batching mode as one of the modes the receiver SHOULD support. Take a Function as a Service: Usually, each event should be processed in on an own instance of a function. This is trivial to implement when a HTTP Request always contains a single event. A batch doesn't map nicely onto the core FaaS abstractions.
The HTTP Content-Type is a bit tricky - we want that the receiver can distinguish between the events based on the header, so I've created
application/cloudevents-batch+json
. However, I'm not sure if this clutters the namespace too much.Signed-off-by: Christoph Neijenhuis [email protected]