Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the shape of a event by encapsulating the payload in event.data. #2926

Closed
wants to merge 6 commits into from

Conversation

MSNev
Copy link
Contributor

@MSNev MSNev commented Nov 7, 2022

Changes

This PR extends the event specification to define the "shape" of an event by encapsulating the payload (content) of the event in the event.data attribute.

This defines how events with a payload should be defined, this enables support for downstream systems to handle both known known domain-specific events (identified by event.domain and event.name) and any other arbitrary new or vender specific events.

Downstream systems can safely assume that when a Log record contains the event.domain, event.name and event.data that the Log record represents an event and can then perform any validation or storage (routing) changes that the vender may require.

This provides support not only for events being defined by the RUM Sig, but also includes support for the mapping of CloudEvents, which can be defined by a later PR.
Possible CloudEvents mapping Example:

  • event.domain: "cloudevents"
  • event.name: "xxxx"
  • event.data: (The cloudEvents data)
    With Cloud Events Context Attributes (Required and optional) being mapped to top level OpenTelemetry Attributes of the same name

There was a discussion around using the "Body" of a log message as the transport for the payload instead of defining an event.data, however, while the Logs is intended to be the primary transport mechanism events the definition of an event is not limited to Logs only. This allows for an event to be transmitted as a span event (which doesn't have a "Body") and also supports direct logical mapping for CloudEvents.

@MSNev MSNev requested review from a team November 7, 2022 22:13
@MSNev MSNev marked this pull request as draft November 7, 2022 22:20
@MSNev MSNev force-pushed the MSNev/EventData branch 3 times, most recently from 0d1492e to ba9b2d0 Compare November 7, 2022 22:42
@ronyis
Copy link

ronyis commented Nov 22, 2022

I think that log events should also support representing the payload of application requests. For example, in order for a user to collect the response body of an outgoing HTTP request, saving it as part of a span is problematic since the body could be read after the span representing the request is closed (many times it's also read asynchronously).

Using a map<string, any> is not generic enough for this example, using any is preferable.

I'm also in favor of using the log Body instead of an attribute. For the case of using a Span Event, I think we should be able to support an any type attribute as well (similar to as proposed in #2888).

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 30, 2022
@scheler
Copy link
Contributor

scheler commented Nov 30, 2022

not stale

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 21, 2022
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Dec 28, 2022
@MSNev MSNev force-pushed the MSNev/EventData branch 2 times, most recently from ca15822 to 354a1f8 Compare January 17, 2023 16:14
@MSNev MSNev force-pushed the MSNev/EventData branch 3 times, most recently from 3f5f92d to 838c04f Compare February 22, 2023 03:09
@jack-berg
Copy link
Member

The problems that this solves are too abstract for me support at this time. In particular, it's not clear how downstream validation of event.data would work and why including the event payload as top level attributes prevents this. There's a chance of collision, but is this a theoretical problem or one we've seen in practice? From a validation standpoint, a receiver of events in possession of a schema should be able to validate the known fields according to the schema, ignoring unrecognized fields. It's free to drop or accept unrecognized fields needed. This appears to be true whether or not the payload is encapsulated in a single field.

Based off discussions in the 2/22/23 log sig, this change seems important to the client instrumentation working group. If this inclusion is in fact blocking progress, can a demo be given which shows concrete situations where including event data as top level attributes fails?

@MSNev
Copy link
Contributor Author

MSNev commented Feb 28, 2023

In particular, it's not clear how downstream validation of event.data would work

I have attempted to not restrict or define how (or if) downstream validation would occur as some vendors will and some wont. And others may choose to provide validation as a development (non-production) client side validation tool.

and why including the event payload as top level attributes prevents this.

Simplistically, It doesn't. If someone wants to construct a "Custom" event which only uses top-level attributes they could, as part of this specification definition I'm saying they SHOULDN'T because of the chance of collisions and to simplify how back-end systems can identify, process and store "Events". (Which can also be viewed as a specific Structured Log records). There are additional reasons why specifying that event detail is passed in a single attribute (Which I'll include below -- I explicitly left out of the spec detail). This is also the one of the primary reasons for choosing to send "events" via Logs as it IS the only signal that supports nested attributes. Otherwise, we could have just continued to hack zero-length Span's with span Event (which has additional issues)

There's a chance of collision, but is this a theoretical problem or one we've seen in practice?

Because many of the RUM events are defined by scraping and passing data based on existing W3C standards (at least for browser events), these names are fixed and as they evolve additional names WILL (and have) be added. The only way to avoid the collisions would be to prefix EVERY field which has performance, memory and payload considerations (all of which are major issues for client runtimes, whether a browser, mobile device, IOT or embedded into "SMART Tv's". (Yes, I've supported this)

Why use a single attribute

Without encapsulating the event payload into a single nested attribute, the following (or combination) would need to occur, whether to avoid collisions or not.

  • prefix every "value" with a common prefix (eg event.<name / data>.xxxx) this will cause an explosion with the payload size, especially when the event data is large like W3C resource timing fields (these are used for page load as well Ajax requests and will include the "resources" (scripts, css, images) loaded as part of page load)
  • Each "new" attribute name should be "added" to semantic conventions to avoid using the same name for different things / semantics (or be prefixed), this will slow down the event definition even further than it already is and would be unnecessary when the fields are only required for a single event. We have a specific Goal is to AVOID defining individual attribute names, we explicitly want to define the semantic conventions (collection of value / names) at the "Event" level only.
  • Generally RUM systems can have MANY (10s, 100s) of specific events, and it's not just a "Span" / "Log", each event has a specific meaning for monitor real user interactions with a system. And depending on the application (or observability vendor) there may have many/many/many custom events (beyond what we will define as part of the standard OpenTelemetry Events) and will need to be supported.

I've mentioned payload size before, but let me restate here, during the "Page/Application unload" (user navigates to another page, closes the browser, switches tabs and on mobile devices turns the device (screen) off - forcing the app to hibernation (and potentially never wake up)), so at this point any SDK should attempt to send all observability content (as it may not have any persistence capability or even a chance to re-hydrate and send (eg. classic examples of this are crashes).

During this "unload" state modern browsers have a restricted set of API's to allow the "delivery" of any network requests, both of these have a MAXIMUM payload of 64kb (of total outstanding (unsent) requests -- you can't just send multiple 64kb payloads you can send 1 total or several which add up to 64kb), and while there is currently an experimental compression API (it's not supported on Firefox) and it appears that it can't be supported by the primary sending API (because we can't include headers informing the receiver that the HTTP request is compressed (gzipped)), likewise on unsupported browsers including the additional client side code (which increases the page load time) also doesn't help, because you still can't add the additional header telling the receiver that the payload is compressed....

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@MSNev MSNev force-pushed the MSNev/EventData branch from eb8676a to 3635380 Compare April 11, 2023 15:10
@MSNev
Copy link
Contributor Author

MSNev commented Apr 12, 2023

Not stale

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 21, 2023
@MSNev MSNev requested a review from a team April 21, 2023 16:45
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Apr 29, 2023
@MSNev
Copy link
Contributor Author

MSNev commented May 1, 2023

Waiting for outcome of #3406

scheler added a commit to scheler/semantic-conventions that referenced this pull request May 15, 2023
…entations. This document serves more for the purpose as a schema for the events than the semantic conventions as supported in Otel today.

Related PRs open-telemetry/opentelemetry-specification#2926
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants