Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Transport Binding for batching JSON #370

Merged
merged 6 commits into from
Feb 5, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 122 additions & 11 deletions http-transport-binding.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ This document is a working draft.
3. [HTTP Message Mapping](#3-http-message-mapping)
- 3.1. [Binary Content Mode](#31-binary-content-mode)
- 3.2. [Structured Content Mode](#32-structured-content-mode)
- 3.3. [Batched Content Mode](#33-batched-content-mode)
4. [References](#4-references)

## 1. Introduction
Expand Down Expand Up @@ -58,26 +59,33 @@ which is compatible with HTTP 1.1 semantics.

### 1.3. Content Modes

This specification defines two content modes for transferring events:
*structured* and *binary*. Every compliant implementation SHOULD support both
modes.

In the *structured* content mode, event metadata attributes and event data are
placed into the HTTP request or response body using an [event
format](#14-event-formats).
This specification defines three content modes for transferring events:
*binary*, *structured* and *batched*. Every compliant implementation SHOULD
support the *structured* and *binary* modes.

In the *binary* content mode, the value of the event `data` attribute is placed
into the HTTP request or response body as-is, with the `contenttype` attribute
value declaring its media type; all other event attributes are mapped to HTTP
headers.

In the *structured* content mode, event metadata attributes and event data are
placed into the HTTP request or response body using an [event
format](#14-event-formats).

In the *batched* content mode several events are batched into a single HTTP
request or response body using an [event format](#14-event-formats) that
supports batching.

### 1.4. Event Formats

Event formats, used with the *structured* content mode, define how an event is
expressed in a particular data format. All implementations of this
specification MUST support the [JSON event format][JSON-format], but MAY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to make this clearer w.r.t. which json format we mean? single vs batched - or do they have to support both?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cneijenhuis what do you think about making this:

... MUST support the non-batching [JSON event format][JSON-format], but MAY...

so just add non-batching ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! d3f2edb

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

support any additional, including proprietary, formats.

Event formats MAY additionally define how a batch of events is expressed. Those
can be used with the *batched* content mode

### 1.5. Security

This specification does not introduce any new security features for HTTP, or
Expand Down Expand Up @@ -124,13 +132,15 @@ The event binding is identical for both HTTP request and response messages.
The content mode is chosen by the sender of the event, which is either the
requesting or the responding party. Gestures that might allow solicitation of
events using a particular mode might be defined by an application, but are not
defined here.
defined here. The *batched* mode MUST NOT be used unless solicited, and the
gesture SHOULD allow the receiver to choose the maximum size of a batch.

The receiver of the event can distinguish between the two modes by inspecting
The receiver of the event can distinguish between the three modes by inspecting
the `Content-Type` header value. If the value is prefixed with the CloudEvents
media type `application/cloudevents`, indicating the use of a known [event
format](#14-event-formats), the receiver uses *structured* mode, otherwise it
defaults to *binary* mode.
format](#14-event-formats), the receiver uses *structured* mode. If the value
is prefixed with `application/cloudevents-batch`, the receiver uses the
*batched* mode. Otherwise it defaults to *binary* mode.

If a receiver detects the CloudEvents media type, but with an event format that
it cannot handle, for instance `application/cloudevents+avro`, it MAY still
Expand Down Expand Up @@ -330,6 +340,105 @@ Content-Length: nnnn

```

### 3.3. Batched Content Mode

In the *batched* content mode several events are batched into a single HTTP
request or response body. The chosen [event format](#14-event-formats) MUST
define how a batch is represented. Currently, the only format supporting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to change "currently, the only" if/when that changes. It makes sense to make an absolute statement like "The JSON event format that MUST be supported by any compliant implementation supports batching."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't implement your suggestion exactly, because it seems to imply that batching MUST be supported by any compliant implementation. In the current proposal, it is however optional, and I think it should stay optional, as I wrote in the PR description:

I also did not include the batching mode as one of the modes the receiver SHOULD support. Take a Function as a Service: Usually, each event should be processed in on an own instance of a function. This is trivial to implement when a HTTP Request always contains a single event. A batch doesn't map nicely onto the core FaaS abstractions.

Is what I added in 1191dd2 fine?

batching is the [JSON Batch Format][JSON-batch-format].

#### 3.3.1. HTTP Content-Type

The [HTTP `Content-Type`][Content-Type] header MUST be set to the media type of
an [event format](#14-event-formats).

Example for the [JSON Batch format][JSON-batch-format]:

``` text
Content-Type: application/cloudevents-batch+json; charset=UTF-8
```

#### 3.3.2. Event Data Encoding

The chosen [event format](#14-event-formats) defines how a batch of events and
all event attributes, including the `data` attribute, are represented.

The batch of events is then rendered in accordance with the event format
specification and the resulting data becomes the HTTP message body.

The batch MAY be empty (typically used in a HTTP response).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strike "(typically used in a HTTP response)". There might be reasons to push an empty (zero elements) batch.

All batched CloudEvents MUST have the same `specversion` attribute. Other
attributes MAY differ, including the `contenttype` attribute.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datacontenttype


#### 3.2.3 Examples

This example shows two batched CloudEvents, sent with a PUT request:

``` text

PUT /myresource HTTP/1.1
Host: webhook.example.com
Content-Type: application/cloudevents-batch+json; charset=utf-8
Content-Length: nnnn

[
{
"specversion" : "0.2",
"type" : "com.example.someevent",

... further attributes omitted ...

"data" : {
... application data ...
}
},
{
"specversion" : "0.2",
"type" : "com.example.someotherevent",

... further attributes omitted ...

"data" : {
... application data ...
}
}
]

```

This example shows two batched CloudEvents returned in a response:

``` text

HTTP/1.1 200 OK
Content-Type: application/cloudevents-batch+json; charset=utf-8
Content-Length: nnnn

[
{
"specversion" : "0.2",
"type" : "com.example.someevent",

... further attributes omitted ...

"data" : {
... application data ...
}
},
{
"specversion" : "0.2",
"type" : "com.example.someotherevent",

... further attributes omitted ...

"data" : {
... application data ...
}
}
]

```

## 4. References

- [RFC2046][RFC2046] Multipurpose Internet Mail Extensions (MIME) Part Two:
Expand All @@ -351,8 +460,10 @@ Content-Length: nnnn

[CE]: ./spec.md
[JSON-format]: ./json-format.md
[JSON-batch-format]: ./json-format.md#4-json-batch-format
[Content-Type]: https://tools.ietf.org/html/rfc7231#section-3.1.1.5
[JSON-Value]: https://tools.ietf.org/html/rfc7159#section-3
[JSON-Array]: https://tools.ietf.org/html/rfc7159#section-5
[RFC2046]: https://tools.ietf.org/html/rfc2046
[RFC2119]: https://tools.ietf.org/html/rfc2119
[RFC2818]: https://tools.ietf.org/html/rfc2818
Expand Down
76 changes: 74 additions & 2 deletions json-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ This document is a working draft.
1. [Introduction](#1-introduction)
2. [Attributes](#2-attributes)
3. [Envelope](#3-envelope)
4. [References](#4-references)
4. [JSON Batch Format](#4-json-batch-format)
5. [References](#5-references)

## 1. Introduction

Expand Down Expand Up @@ -229,7 +230,77 @@ a `Map` or [JSON data](#31-special-handling-of-the-data-attribute) data:
}
```

## 4. References
## 4. JSON Batch Format

In the *JSON Batch Format* several CloudEvents are batched into a single JSON
document. The document is a JSON array filled with CloudEvents in the
[JSON Event format][JSON-format].

Although the *JSON Batch Format* builds ontop of the *JSON Format*, it is
considered as a separate format: a valid implementation of the *JSON Format*
doesn't need to support it. The *JSON Batch Format* MUST NOT be used when only
support for the *JSON Format* is indicated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this last sentence is appropriate since we don't get into how things like this are "indicated".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with dropping it - the sentence tries to re-inforce/clarify the difference between the two formats, but the point has already been made in the sentence before.


### 4.1. Mapping CloudEvents

This section defines how a batch of CloudEvents is mapped to JSON.

The outermost JSON element is a [JSON Array][JSON-array], which contains
CloudEvents rendered in accordance with the [JSON event format][JSON-format]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... as elements ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach with the use of a JSON array, however I'd like to see an alternate JSON event format that supports a 2D array.

Time series events are most efficiently contained within a grid (tabular) structure which can be represented by a two-dimensional "array" as an "object". A CloudEvents "message-level" Batch object can comprise this 2D array.

The column values within the 2D array can correspond to CloudEvent "event-level" attributes (e.g., id, time, type, data), which can include attributes from one or more extensions (e.g., sequence, object, attribute).

Another "message-level" Schema attribute (type: array as object) can map the column indices to the names of "event-level" attributes:

0, "time"
1, "type"
2, "object"
3, "attribute"
4, "data"

The Schema array could also include constraint columns (e.g. Required) allowing flexibility among producer/consumer ecosystems.

This design has 2 benefits:

  1. It compresses a multi-event payload by removing repetitive "names" in name (attribute)-value pairs.

  2. It streamlines a consumer's parsing a multi-event payload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are proposing a different data layout model. That's not a batch, it's a wholly different encoding. I discuss this here as "Record Sequence with Metadata Preamble"

https://vasters.com/blog/data-encodings-and-layout/

If you care about compact time series data transfers, we should make an Apache Avro encoding, because Avro is really good at that.

specification.

### 4.2. Envelope

A JSON Batch of CloudEvents MUST use the media type
`application/cloudevents-batch+json`.

### 4.3. Examples

An example containing two CloudEvents: The first with `Binary`-valued data, the
second with JSON data.

``` JSON
[
{
"specversion" : "0.2",
"type" : "com.example.someevent",
"source" : "/mycontext/4",
"id" : "B234-1234-1234",
"time" : "2018-04-05T17:31:00Z",
"comexampleextension1" : "value",
"comexampleextension2" : {
"otherValue": 5
},
"contenttype" : "application/vnd.apache.thrift.binary",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datacontenttype

"data" : "... base64 encoded string ..."
},
{
"specversion" : "0.2",
"type" : "com.example.someotherevent",
"source" : "/mycontext/9",
"id" : "C234-1234-1234",
"time" : "2018-04-05T17:31:05Z",
"comexampleextension1" : "value",
"comexampleextension2" : {
"otherValue": 5
},
"contenttype" : "application/json",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dct

"data" : {
"appinfoA" : "abc",
"appinfoB" : 123,
"appinfoC" : true
}
}
]
```

An example of an empty batch of CloudEvents (typically used in a response):

```JSON
[]
```

## 5. References

* [RFC2046][RFC2046] Multipurpose Internet Mail Extensions (MIME) Part Two:
Media Types
Expand All @@ -250,6 +321,7 @@ a `Map` or [JSON data](#31-special-handling-of-the-data-attribute) data:
[JSON-Number]: https://tools.ietf.org/html/rfc7159#section-6
[JSON-String]: https://tools.ietf.org/html/rfc7159#section-7
[JSON-Value]: https://tools.ietf.org/html/rfc7159#section-3
[JSON-Array]: https://tools.ietf.org/html/rfc7159#section-5
[RFC2046]: https://tools.ietf.org/html/rfc2046
[RFC2119]: https://tools.ietf.org/html/rfc2119
[RFC3986]: https://tools.ietf.org/html/rfc3986
Expand Down