-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect data_stream.{dataset,namespace} for OTel logs, metrics, and traces #201
Conversation
@@ -182,6 +182,16 @@ func (c *Consumer) convertLogRecord( | |||
event.Network.Connection = modelpb.NetworkConnectionFromVTPool() | |||
} | |||
event.Network.Connection.Type = v.Str() | |||
case attributeDataStreamDataset: | |||
if event.DataStream == nil { | |||
event.DataStream = modelpb.DataStreamFromVTPool() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this respect OTLP Resource.attributes and LogRecord.attributes or only one? What about ScopeLogs.Scope.attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. It was only reading LogRecord.attributes, but I've updated the PR to only Resource.attributes (only one). Looking at the code, we never read from ScopeLogs.Scope.attributes. Let me know if it makes more sense to read from both LogRecord.attributes and Resource.attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently unspecified in https://github.com/elastic/opentelemetry-dev/blob/main/docs/design-decisions/ingest/routing.md and I think we need to clarify this.
From an OTel Collector perspective, I think if we're only to support one, Scope.attributes would actually match the best. Reason being is that it's typical for a related dataset to emit a distinct Scope, which may come from the same Resource as other instrumentation. For example you may have two scopes, otelcol/hostmetricsreceiver/cpu
and otelcol/hostmetricsreceiver/memory
, which should be indexed into separate data streams coming from the same resource (host). Similarly you could have logs from two different application SDKs (say nginx and custom java app) from the same host, but should end up in different data streams for easier anaylsis and more compact storage.
That said, we could support all 3 levels with some precedence rules (I would expect LogRecord.attributes > Scope.attributes > Resource.attributes)
@axw have you given this any thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain than any of them will cover all scenarios, e.g.:
- resource attributes aren't appropriate for routing two different data sets produced by one data source, since resource describes the data source
- instrumentation scope attributes aren't appropriate for routing data for two different service instances, since they would have identical instrumentation, just different resource attributes (e.g.
service.instance.id
) - I'm not sure that signal-level attributes are ever appropriate?
That said, we could support all 3 levels with some precedence rules (I would expect LogRecord.attributes > Scope.attributes > Resource.attributes)
Sounds good, for consistency with the attribute precedence we've already documented (sans top-level fields).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support all 3 levels with some precedence rules (I would expect LogRecord.attributes > Scope.attributes > Resource.attributes)
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, we could support all 3 levels with some precedence rules (I would expect LogRecord.attributes > Scope.attributes > Resource.attributes)
I'm revisiting the docs and I'm a little confused here. The attribute precedence described by @joshdover (which I implemented in the PR atm) appears to be in the opposite order to our documented vision.
How well is the existing document schema for these logs indexed into ES when they target data streams that use the default Note that depending on the discussion in https://github.com/elastic/opentelemetry-dev/pull/59, we may need to append/prepend a |
With a document as simple as
They look virtually the same in logs-generic-default vs logs-apm*, at least in Discover. There are subtle differences in the underlying mapping. logs-apm-mapping.json Dotted fields are de-dotted, e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just one request. I wonder if we should do it for traces and metrics already, given that this is new functionality and won't affect anything unless these attributes are set.
@axw I've updated the PR to do the same for logs, metrics and traces. |
model/modelprocessor/datastream.go
Outdated
// a different datastream. | ||
if isRUMAgentName(event.GetAgent().GetName()) { | ||
event.DataStream.Dataset = rumTracesDataset | ||
} | ||
} | ||
case modelpb.ErrorEventType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[For reviewer] Errors are not handled in a sense that the dataset
is not respected. apm.error
will always be used but the namespace is set by input data_stream.namespace
.
model/modelprocessor/datastream.go
Outdated
event.DataStream.Dataset = tracesDataset | ||
// In order to maintain different ILM policies, RUM traces are sent to | ||
// a different datastream. | ||
if isRUMAgentName(event.GetAgent().GetName()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[For reviewer] RUM check is bypassed if DataStream is specified. Is this desirable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminded me: RUM agents shouldn't be control routing until we have some way for operators to restrict them. Otherwise (unauthenticated, untrusted) RUM agents can create arbitrarily many data streams, and DoS Elasticsearch.
input/otlp/traces.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[For reviewer] QQ: line 222, 223 otelSpan.Events()
and event = event.CloneVT()
means that errors and logs from traces automatically inherit data_stream.* from resource, scope and span.
- For errors, in the PR we currently always override dataset to
apm.error
, but namespace will still be set using the input'sdata_stream.namespace
. Since errors are special, so it makes sense to override dataset toapm.error
in SetDataStream, but should it be consistent and override namespace as well? - For logs from traces, both dataset and namespace will be set by input's
data_stream.*
. It makes it indistinguishable from plain otel logs. Is this desirable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we are not reading data_stream.*
inside convertSpanEvent
. Should we do so as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also take it from the span event attributes, in case there's a need to override the span-level attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also take it from the span event attributes
Q: span event including errors? Does it mean that we are allowing users to override apm.error
dataset?
Q: what about events from jaeger? Should we parse data stream from attributes as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: span event including errors? Does it mean that we are allowing users to override apm.error dataset?
🥲 probably shouldn't allow overriding the dataset for now. Maybe later, if we can stop relying on the dataset for identifying errors. I expect we'll end up searching all logs for errors, but too early to say yet.
Q: what about events from jaeger? Should we parse data stream from attributes as well?
I don't think it matters a great deal - IMO whatever option avoids having a special case for Jaeger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably shouldn't allow overriding the dataset for now
Agreed. Dataset will be parsed at OTel conversion layer but later reset to apm.error
by SetDataStream. However, data_stream.namespace
will be parsed and respected. Not sure if you consider this a bug.
IMO whatever option avoids having a special case for Jaeger.
Although it means a few more lines, I think it is good to be consistent and do the same for Jaeger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Dataset will be parsed at OTel conversion layer but later reset to apm.error by SetDataStream. However, data_stream.namespace will be parsed and respected. Not sure if you consider this a bug.
I do, we should not allow untrusted agents to control data stream naming at all.
Although it means a few more lines, I think it is good to be consistent and do the same for Jaeger.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do, we should not allow untrusted agents to control data stream naming at all.
Which I see you have implemented, so LGTM
Yes, it is a separate task to switch default dataset and possibly handle otel and non-otel cases separately. We can delay that. It should not block this PR. |
@axw sorry for dragging you into all these low level talk. I've updated the PR description to list the implications / breaking changes and edge case handling. See if it looks good to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carsonip I've reviewed the description and skimmed the code, changes LGTM. I think the order that Josh described sounds right, so you're that right it's not the same order as for queries.
I haven't reviewed all the code, let me know if I need to.
Implications / Breaking changes
data_stream.dataset
anddata_stream.namespace
are parsed in the order of resource attributes, scope attributes, span attributes, and span events attributes. The latest match wins. e.g. a span event data_stream.dataset will override the span's attributes.data_stream.dataset
anddata_stream.namespace
are inherited from earlier levels if they are not specified at the current level. e.g. if a span does not specify data_stream.dataset, it will use the scope's data_stream.dataset.data_stream.dataset
anddata_stream.namespace
are parsed and no longer set as labels (data_stream_dataset
anddata_stream_namespace
)Edge cases
data_stream.dataset
anddata_stream.namespace
are parsed for error span events, onlydata_stream.namespace
will be respected, becausedata_stream.dataset
will be overridden by SetDataStream.data_stream.dataset
anddata_stream.namespace
are parsed, none of them will be respected as they will be overridden by SetDataStream.Closes elastic/apm-server#10191