Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Transport Binding for batching JSON #370

Merged
merged 6 commits into from
Feb 5, 2019

Conversation

cneijenhuis
Copy link
Contributor

In the discussion from last week about #360 we agreed that a batching option for the HTTP Transport Binding is useful. This is an attempt to introduce it.

During the call, @clemensv suggested to build the batching into the structured mode. I couldn't figure out how to do this generically for any event format. Instead, I did it for the JSON format only.

I also did not include the batching mode as one of the modes the receiver SHOULD support. Take a Function as a Service: Usually, each event should be processed in on an own instance of a function. This is trivial to implement when a HTTP Request always contains a single event. A batch doesn't map nicely onto the core FaaS abstractions.

The HTTP Content-Type is a bit tricky - we want that the receiver can distinguish between the events based on the header, so I've created application/cloudevents-batch+json. However, I'm not sure if this clutters the namespace too much.

Signed-off-by: Christoph Neijenhuis [email protected]

@duglin
Copy link
Collaborator

duglin commented Jan 15, 2019

travis isn't happy :-( I wonder if it's the case of the "B"

@cneijenhuis
Copy link
Contributor Author

Yep, that was it. Thanks, fixed! :)

Signed-off-by: Christoph Neijenhuis <[email protected]>
@deissnerk
Copy link
Contributor

deissnerk commented Jan 24, 2019

LGTM
How about adding support for batching to the web hook spec? In addition to Webhook-Request-Rate one could think of Webhook-Batch-Size or Webhook-CE-Batch-Size. I know that @clemensv wanted to keep the web hook spec independent from CloudEvents. So we might add this somehow on top of the basic web hook spec.

Thoughts?

@clemensv
Copy link
Contributor

@deissnerk @cneijenhuis batching is about structuring the message payload (one or multiple). The WebHook spec is about how you deliver the payload, independent of how it's structured.

  • Event format: How to encode one event or a batch of events
  • Transport: How to bind the encoded message/batch to a transport message
  • WebHook (etc): Application contract to deliver the transport message to an endpoint and handle related errors.


In the *batched* content mode several events are batched into a single HTTP
request or response body. The chosen [event format](#14-event-formats) MUST
define how a batch is represented. Currently, the only format supporting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to change "currently, the only" if/when that changes. It makes sense to make an absolute statement like "The JSON event format that MUST be supported by any compliant implementation supports batching."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't implement your suggestion exactly, because it seems to imply that batching MUST be supported by any compliant implementation. In the current proposal, it is however optional, and I think it should stay optional, as I wrote in the PR description:

I also did not include the batching mode as one of the modes the receiver SHOULD support. Take a Function as a Service: Usually, each event should be processed in on an own instance of a function. This is trivial to implement when a HTTP Request always contains a single event. A batch doesn't map nicely onto the core FaaS abstractions.

Is what I added in 1191dd2 fine?

The batch of events is then rendered in accordance with the event format
specification and the resulting data becomes the HTTP message body.

The batch MAY be empty (typically used in a HTTP response).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strike "(typically used in a HTTP response)". There might be reasons to push an empty (zero elements) batch.

json-format.md Outdated
This section defines how a batch of CloudEvents is mapped to JSON.

The outermost JSON element is a [JSON Array][JSON-array], which contains
CloudEvents rendered in accordance with the [JSON event format][JSON-format]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... as elements ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach with the use of a JSON array, however I'd like to see an alternate JSON event format that supports a 2D array.

Time series events are most efficiently contained within a grid (tabular) structure which can be represented by a two-dimensional "array" as an "object". A CloudEvents "message-level" Batch object can comprise this 2D array.

The column values within the 2D array can correspond to CloudEvent "event-level" attributes (e.g., id, time, type, data), which can include attributes from one or more extensions (e.g., sequence, object, attribute).

Another "message-level" Schema attribute (type: array as object) can map the column indices to the names of "event-level" attributes:

0, "time"
1, "type"
2, "object"
3, "attribute"
4, "data"

The Schema array could also include constraint columns (e.g. Required) allowing flexibility among producer/consumer ecosystems.

This design has 2 benefits:

  1. It compresses a multi-event payload by removing repetitive "names" in name (attribute)-value pairs.

  2. It streamlines a consumer's parsing a multi-event payload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are proposing a different data layout model. That's not a batch, it's a wholly different encoding. I discuss this here as "Record Sequence with Metadata Preamble"

https://vasters.com/blog/data-encodings-and-layout/

If you care about compact time series data transfers, we should make an Apache Avro encoding, because Avro is really good at that.

Signed-off-by: Christoph Neijenhuis <[email protected]>
@deissnerk
Copy link
Contributor

Event format: How to encode one event or a batch of events
Transport: How to bind the encoded message/batch to a transport message
WebHook (etc): Application contract to deliver the transport message to an endpoint and handle related errors.

@clemensv I understand that distinction between transport and web hook. That's why I was suggesting to define the header fields separate from the core web hook spec. Technically I think it makes sense, that the delivery target can specify the (maximum) number of events each message is supposed to contain in addition to the request rate,
One more thing that might be interesting, is to have additional fields like "Accept: application/cloudevents-batch+json" in the validation response to signal the support of batching.

@cneijenhuis
Copy link
Contributor Author

I do agree that there is an interesting interaction between request rate and batch size. I even wonder if specifying both statically is useful. E.g. if the request rate is 50/minute, and the batch size is 50:

  • I send 1 msg per sec. After 50 sec, I've send 50 msgs and have to stop.
  • I send 50 msgs every two sec. After 50 sec, I've send 1250 msgs, and can send 1250 more messages.

Maybe it is better to interpret the request rate as event rate. E.g. for a request rate of 10, you can send me 10 events in 10 requests, or you can send me 10 events in a single batch.

One more thing that might be interesting, is to have additional fields like "Accept: application/cloudevents-batch+json" in the validation response to signal the support of batching.

My take is that there is an initial registration step (which is not standardized), and - if the sender offers a choice - the delivery target (administrator) should choose the format and mode.

In the *batched* content mode several events are batched into a single HTTP
request or response body using an [event format](#14-event-formats) that
supports batching.

### 1.4. Event Formats

Event formats, used with the *structured* content mode, define how an event is
expressed in a particular data format. All implementations of this
specification MUST support the [JSON event format][JSON-format], but MAY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to make this clearer w.r.t. which json format we mean? single vs batched - or do they have to support both?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cneijenhuis what do you think about making this:

... MUST support the non-batching [JSON event format][JSON-format], but MAY...

so just add non-batching ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! d3f2edb

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks


The batch MAY be empty.
All batched CloudEvents MUST have the same `specversion` attribute. Other
attributes MAY differ, including the `contenttype` attribute.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datacontenttype

Although the *JSON Batch Format* builds ontop of the *JSON Format*, it is
considered as a separate format: a valid implementation of the *JSON Format*
doesn't need to support it. The *JSON Batch Format* MUST NOT be used when only
support for the *JSON Format* is indicated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this last sentence is appropriate since we don't get into how things like this are "indicated".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with dropping it - the sentence tries to re-inforce/clarify the difference between the two formats, but the point has already been made in the sentence before.

json-format.md Outdated
"comexampleextension2" : {
"otherValue": 5
},
"contenttype" : "application/vnd.apache.thrift.binary",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datacontenttype

json-format.md Outdated
"comexampleextension2" : {
"otherValue": 5
},
"contenttype" : "application/json",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dct

@clemensv
Copy link
Contributor

I have an implementation of this here now: https://github.com/clemensv/cloudevents-sdk-csharp/tree/batching

The only little pitfall I've found is that a contentType.startsWith("application/cloudevents") is now the false test for probing for a CloudEvent. You must include the plus sign before the format indicator since application/cloudevents-batch also matches that test.

@duglin
Copy link
Collaborator

duglin commented Feb 4, 2019

This PR was approved on last week's call with the agreement to make the minor editorial changes that @cneijenhuis has now made. So, LGTM from me, however, I'd like more one LGTM just to make sure one more set of eyes has verified that the latest round of edits didn't introduce some minor bug by mistake.

@duglin
Copy link
Collaborator

duglin commented Feb 4, 2019

@dmigliori we discussed your comment ( #370 (comment) ) on last week's call. The general consensus was that your proposal is independent of the "batching" feature. Would you be willing to open an issue or even better, a PR, so the group could discuss your proposal as a stand-alone feature?

@deissnerk
Copy link
Contributor

@cneijenhuis

I do agree that there is an interesting interaction between request rate and batch size. I even wonder if specifying both statically is useful. E.g. if the request rate is 50/minute, and the batch size is 50:

  • I send 1 msg per sec. After 50 sec, I've send 50 msgs and have to stop.
  • I send 50 msgs every two sec. After 50 sec, I've send 1250 msgs, and can send 1250 more messages.

My original intention was that with a maximum batch size you could limit memory usage on the receiver side. I agree that the overall event rate might be interesting, too.

One more thing that might be interesting, is to have additional fields like "Accept: application/cloudevents-batch+json" in the validation response to signal the support of batching.

My take is that there is an initial registration step (which is not standardized), and - if the sender offers a choice - the delivery target (administrator) should choose the format and mode.

Fair enough.

@cneijenhuis
Copy link
Contributor Author

My original intention was that with a maximum batch size you could limit memory usage on the receiver side.

Ah, I see. Yes, both limits make sense in their own right.

@duglin
Copy link
Collaborator

duglin commented Feb 4, 2019

@cneijenhuis @deissnerk anything we need to modify here before we merge?

@deissnerk
Copy link
Contributor

@duglin No, I don't think so. This would be a separate issue or PR. I don't really know, where to put it. We wanted to keep the web hook spec CloudEvent agnostic.

@duglin
Copy link
Collaborator

duglin commented Feb 5, 2019

@deissnerk ok thanks. Anyone wanna give it one more look-over and an LGTM?

@deissnerk
Copy link
Contributor

@duglin LGTM

@duglin
Copy link
Collaborator

duglin commented Feb 5, 2019

@deissnerk thanks! merging

@duglin duglin merged commit 70b6522 into cloudevents:master Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants