Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify logs vs events vocabulary and usage #2863

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 103 additions & 10 deletions specification/logs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ aliases: [/docs/reference/specification/logs/overview]
- [OpenTelemetry Solution](#opentelemetry-solution)
- [Log Correlation](#log-correlation)
- [Events and Logs](#events-and-logs)
* [OpenTelemetry Event Definition](#opentelemetry-event-definition)
* [FAQ](#faq)
- [Legacy and Modern Log Sources](#legacy-and-modern-log-sources)
* [System Logs](#system-logs)
* [Infrastructure Logs](#infrastructure-logs)
Expand Down Expand Up @@ -124,10 +126,10 @@ languages have established standards for using particular logging libraries. For
example in Java world there are several highly popular and widely used logging
libraries, such as Log4j or Logback.

OpenTelemetry defines [events](#events-and-logs) as a type of LogRecord with
OpenTelemetry defines [Events](#events-and-logs) as a type of LogRecord with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to elaborate a bit on my thoughts in the SIG meeting today...

What we framed things this way:

"OpenTelemetry provides a generic way to define additional signal types. These signals are modeled as a type of LogRecord with specific characteristics...."

Then instead of event.domain and event.name we use signal.type and signal.name.

Examples include

signal.type signal.name
browser click well-known type (i.e. the semantics are defined by the OTel spec)
k8s pod_start well-known type (i.e. the semantics are defined by the OTel spec)
my_custom_type some_name not well-known (i.e, the end user defines the semantics)
metric http.server.duration this one is just to illustrate how in an alternate universe we might have modeled metrics using one data model to rule them all

Additionally, thinking of things in this way alleviates any need to discuss span events in the scope of this conversation. To me span events are still just simply something that happened during the course of a span at a particular point in time. That is, a span event is not some notion of a nested signal type.

specific characteristics. This definition is not ubiquitous across existing
libraries and languages. In some logging libraries, producing events aligned
with the OpenTelemetry event definition is clunky or error-prone.
with the OpenTelemetry Event definition is clunky or error-prone.

There are also countless existing prebuilt applications or systems that emit
logs in certain formats. Operators of such applications have no or limited
Expand Down Expand Up @@ -208,15 +210,106 @@ Wikipedia’s [definition of log file](https://en.wikipedia.org/wiki/Log_file):
>In computing, a log file is a file that records either events that occur in an
>operating system or other software runs.

From OpenTelemetry's perspective LogRecords and Events are both represented
using the same [data model](./data-model.md).
From OpenTelemetry's perspective logs and events conceptually are not different. Both
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to be focusing on specification/logs folder only. While we don't want to boil the ocean, I've noticed a continuous source of confusion regarding "span events" vs. "log events". I feel that if we don't clarify span events vs log events, we'll continue to run in circles. I want to ask some stepping back questions:

  1. Do we even want to use the term event in both places, or we want to eventually get rid of one?
  2. Why do we want to have the term "event" here? Would something like "named log record" be good enough / better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svrnm I wish to get your opinion here from the documentation perspective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho, s/span events/span logs/ (as it was in OpenTracing) would address some of the confusion. Framed as span logs, these entries can still be sub-categorized as logs vs. events just like regular log records. It does create a new confusion between span logs and regular logs, which is fundamentally unavoidable as we're forcing the user to make this distinction instead of handling it behind the scenes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method to add a span event/log is called AddEvents (based on this across languages, so if we start calling it "Span Log" we run into an issue here:

  • Removing addEvent is probably not possible, because it is a breaking change, so adding addLog as an alias would be an option, but that increases confusion
  • Calling it "Span Log" and then having a method called "addEvent" is really confusing.

My opinion: From a documentation perspective the key point is being conscious about that potential confusion and give end-user guidance here by being transparent and sharing the thought process that went into things, even if it boils down to "today we would call it Span Log, but we had called it Span Event in the past and we don't want breaking changes, so we have to live with that"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we made a mistake when we renamed InstrumentationLibrary to InstrumentationScope in stable API surface and data models and I'd want to avoid making that same mistake again, so I would be strongly opposed to renaming span.AddEvent() or the Span Event concept and OTLP message.

I have concern that using the term Event as a term of art for log records that have specific properties is going to lead to (even more) significant confusion in the long run. I'm not sure I have a good proposal for an alternative at this time, but something that denotes that there is additional information available to help interpret the attributes on the log record would be good. SchematicLogRecord is a mouthful, but it should effectively convey that there is a schema available that describes how to validate and interpret the semantics of the record. Open to alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this vocabulary update should include something about span events and feel having Log Event, Log Record, Span Event will lead to confusion.

The best alternative I could come up with for Log Event is Report, but I also like SemanticLogRecord, it is clearer what it actually is with no definition required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

are represented using the same [LogRecord data model](./data-model.md).

However, OpenTelemetry does recognize a subtle semantic difference between
LogRecords and Events: Events are LogRecords which have a `name` and `domain`.
Within a particular `domain`, the `name` uniquely defines a particular class or
type of event. Events with the same `domain` / `name` follow the same schema
which assists in analysis in observability platforms. Events are described in
more detail in the [semantic conventions](./semantic_conventions/events.md).
### OpenTelemetry Event Definition

OpenTelemetry defines **OpenTelemetry Events** as LogRecords that are shaped
in a special way:

- They have a LogRecord attribute `event.name` (and possibly other LogRecord attributes).
- They have an InstrumentationScope with a non-empty `Name` and with an
InstrumentationScope attribute `event.domain` (and possibly other InstrumentationScope attributes).

Within a particular `event.domain`, the `event.name` uniquely defines a particular class
or type of OpenTelemetry Event. OpenTelemetry Events with the same `event.domain` /
`event.name` follow the same schema which assists in analysis in observability platforms.
See also OpenTelemetry Event [semantic conventions](./semantic_conventions/events.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will discuss this more in the SIG call tomorrow, but I feel we should just say Events are LogRecords with these two attributes. We happened to keep the event.domain attribute in InstrumentationScope for the purpose of API ergonomics - to be able to create multiple Events with a given domain. I also assumed that the InstrumentationScope attributes automatically propagate to the LogRecords inside it, that's why it made sense to me to keep the attribute at the scope level. It turns out this assumption is wrong. However, the backends will move the event.domain attribute into the LogRecord during storage and hence it makes sense to say Events are LogRecords with these two attributes.

Can you also take a look at the changes to ./semantic_conventions/events.md in my PR #2848?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the backends will move the event.domain attribute into the LogRecord during storage

I think you are making an assumption that is not necessarily true for all backends, maybe it is true for some.


Note: in this specification we use capitalized word "Event" as a shorthand for
OpenTelemetry Event. When referring to the generic concept of events this specification
may use the word "event" (both in logging section and other sections, e.g. in metrics
section). This is not to be confused with the capitalized Event that has a precise
definition described above. When there is a confusion possible we will always use
fully qualified concept name: **OpenTelemetry Event**.

To avoid confusion we highly recommend to use the generic word "logs" when referring to
logs and events that are not OpenTelemetry Events.

OpenTelemetry also defines an [API](
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/api.md#emit-event)
that helps to emit LogRecords that are shaped as OpenTelemetry Events.

### FAQ

**What is OpenTelemetry Event?**

It is a specially shaped LogRecord. See [OpenTelemetry Event Definition](#opentelemetry-event-definition).
tigrannajaryan marked this conversation as resolved.
Show resolved Hide resolved

**How are events and logs different?**

They are not. The words "events" and "logs" are synonyms. We prefer the word "logs"
when referring to generic log and event data.

**Who produces OpenTelemetry Events?**

OpenTelemetry Events are produced using OpenTelemetry Events API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to limit the source of the events to OpenTelemetry Events API only? Can't we have OTel collector to fetch events from another event source?


**Why do OpenTelemetry Events exist as a concept?**

OpenTelemetry Events are a class of events designed within OpenTelemetry community
or in compliance with OpenTelemetry recommendations. OpenTelemetry Events have a
particular shape of data that OpenTelemetry believes is beneficial for designers of
events structure to adopt.

**What are the reasons OpenTelemetry Events have an `event.domain` Scope attribute?**

There are 2 reasons:

1. The `event.domain` Scope attribute isolates groups (domains) of Events designed by different
people. Any decisions about the choice of attribute names and other decisions
about the shape of the LogRecord made by designers of Events in a particular domain have
no impact on the design of events in another domain.
In other words, the `event.domain` attribute allows different groups of people to
independently make choices about Event representation in their domain of expertise
without worrying that their choices will impact people who design Events
in some other domain of expertise.

2. The `event.domain` Scope attribute can be used for efficient routing and filtering
decision for a batch of LogRecords that belong to the particular domain. This is enabled
by the design of OTLP protocol which [groups the LogRecords by a Scope](
https://github.com/open-telemetry/opentelemetry-proto/blob/724e427879e3d2bae2edc0218fff06e37b9eb46e/opentelemetry/proto/logs/v1/logs.proto#L64)
on the wire.

**I have a non-OpenTelemetry data source that produces events (e.g. Windows Events).
Should I make sure they are shaped like OpenTelemetry Events when used with OpenTelemetry
software (e.g. inside OpenTelemetry Collector)?**

Not necessarily. Just because the data in an external data source is called an "event" it
does not mean it automatically qualifies as an OpenTelemetry Event and must be shaped like
an OpenTelemetry Event.
Copy link
Contributor

@scheler scheler Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be very helpful to include guidance on when to produce Events vs Logs, and that I think is based on the expectations of the backend/receiver. For example, at AppDynamics, Events and Logs have some differences in their processing pipelines in the backend. Events are relatively limited in number and are produced by sources creating events of a certain importance, whereas Logs are far large in volume with good amount of noise in them. As an example, Kubernetes Events are (or, can be) Events and Kubernetes Pod logs, by default, are Logs.

What should Go Structured Logs map to in OpenTelemetry? IMO, they should remain as Logs as they are used produced by applications for the general purpose of troubleshooting and are likely to contain a good amount of noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events are relatively limited in number and are produced by sources creating events of a certain importance, whereas Logs are far large in volume with good amount of noise in them.

I don't think that this distinction makes a difference from the data model or API perspective. The same could be said about different log severity levels. The important point, AFAICT, is that it needs to be possible for a processor of log records to identify records that satisfy some criteria to direct them for separate processing. That is separate and distinct from how we refer to the concepts of "opaque bucket of bits" log records and "structured and schema-validatable" log records.


**I have non-OpenTelemetry data source that produces events that have a `name` and
`category`. The semantics of the `name` and `category` in this data source are exactly the
same as `event.name` and `event.domain` at OpenTelemetry. What should I do when I bring
these events to OpenTelemetry?**

If there is an exact match in the semantics then it is reasonable to map them to
OpenTelemetry's concepts. So, when the events from the external data source are converted
to OpenTelemetry LogRecords (for example in OpenTelemetry Collector) it is reasonable
to shape them like OpenTelemetry Events. In the given example it is reasonable to map
the `name` field from the data source to `event.name` and the `category` field to
`event.domain`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very unlikely that category is same as domain. The data source is a better fit for domain, because this data source is likely emitting events with different values for category and that's why they have category as a field/attribute in their Events.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the two things we brought up on the SIG making their way into this document.

  • In .NET we have the EventSource class.

    EventSource emits strongly-typed events. Each event emitted is uniquely identified by the EventSource.Name and the individual event's Event.Id / Event.Name. Events also have a message template and may have attributes. All events emitted through an EventSource will the same exact message template + attribute structure.

    So given the definition/description above, I feel like these EventSource events would be appropriate to shim into OpenTelemetry as Events? With EventSource.Name = event.domain & Event.Name = event.name.

  • In .NET we have the ILogger interface.

    ILogger is a framework for emitting structured logs. An ILogger is bound to a categoryName. Typically some type name (ex: "MyCompanyName.MyLoggingClass"). ILogger has a whole bunch of different overloads which emit log messages.

    You can log without any type of strong-name: myLogger.LogInformation("Something happend for {userId}", userId). That will emit a structured log through the defined category.

    You can also log with a strong-name: myLogger.LogInformation(new EventId(id: 100, name: "AddUserFailure), "Something happend for {userId}", userId). That will emit a structured log through the defined category with the strongly-typed name information.

    What to do in this case is less clear. We could shim the second call with EventId supplied into OpenTelemtry Events with categoryName = event.domain and EventId.Name = event.name. It is assumed but not enforced by the API that all logs sharing a categoryName + EventId will share structure. I think if some backend relied on event.domain + event.name for structure detection, and I used ILogger with EventId to strongly type my logs, I would expect reasonably that the backend would be able to detect my structures?

/cc @alanwest


**I am designing a new library/application/system and want to produce structured logs/events
using OpenTelemetry. Should my events be shaped like OpenTelemetry Events?**

Yes. For new designs we recommend to shape your data like OpenTelemetry Events.
Make sure to choose a good descriptive value for `event.domain`. If the domain is common
enough consider adding it as a well-known domain name in OpenTelemetry's [semantic conventions](
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/semantic_conventions/events.md)
for `event.domain` attribute.

## Legacy and Modern Log Sources

Expand Down