-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enrichment/trace] Handle exception for span events #99
Conversation
// TODO(lahsivjar): Is error/log context handling be done in es-exporter? | ||
// Ref: https://github.com/elastic/apm-data/blob/59d33b8113d629f43dc82f9b3931ceda63a19a7a/input/otlp/traces.go#L1243-L1260 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[For reviewers] APM-data sets context for span-events where it sets fields like span.id
, transaction.id
, parent.id
based on the span (ref), however, I am not sure if enrichments is the correct place to do this. Since the es-exporter could decide how to encode span-events into ES docs (separate or in the same doc as parent span) maybe it would be better to have this handled in the expoter? I am confused here, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that all properties that are already mapped as top level fields in the logs data stream should not be handled by enrichments but in the ES exporter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC: @carsonip This would require some changes in the spanevent handling in es-exporter. I think just copying some attributes from the parent span should be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
span event span.id, transaction.id are already done in elasticsearchexporter https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/elasticsearchexporter/model.go#L645-L647
in that case, no change is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need transaction.type for span events?
I don't know
If so, should they be in attributes.* or at top level?
If needed, it should be under attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that all properties that are already mapped as top level fields in the logs data stream should not be handled by enrichments but in the ES exporter.
If needed, it should be under attributes.
As they are under attributes, it is better done in enrichment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As they are under attributes, it is better done in enrichment
I don't think it is a good idea to do this as enrichments since the fields we are talking about here are derived from the span. The ES exporter decides HOW to extract and encode the span event (in a separate doc or in the same doc). Whlie currently we are encoding them as a separate doc making this dependent on enrichment doesn't feel correct to me.
(#99 (comment) discussion thread is also related to the same topic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above commit adds transaction.{type, sampled}
enrichment for span event.
case semconv.AttributeExceptionEscaped: | ||
s.exceptionEscaped = v.Bool() | ||
case semconv.AttributeExceptionType: | ||
s.exceptionType = v.Str() | ||
case semconv.AttributeExceptionMessage: | ||
s.exceptionMessage = v.Str() | ||
case semconv.AttributeExceptionStacktrace: | ||
s.exceptionStacktrace = v.Str() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregkalapos These attributes are mostly 1<>1 mappings however, I didn't see them in the otel-data plugin so added it here. Do you think we should do this here or in the otel-data plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can create alias fields for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, aliases are how we have been handling such fields. Will wait for @gregkalapos to confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd be also for aliases as well.
One exception (no pun intended) may be the handled
flag: from what I remember from the UI testing, the UI access that field, so it always needs to be present (We could question why, but from what I remember, that's how it is today), but exception.escaped
is only recommended. So we always need to populate the handled
flag.
That's the only edge case which comes to my mind, I'd expect the other ones to work with aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Could you add these aliases, Greg? One challenge will be how to add the attribute aliases just for logs, without these getting overridden with the standard set of attribute aliases. Maybe we'll need to add them to the alias definition that applies to all signals.
Also, I'm not sure if we need an alias for the exception stacktrace, since we likely need to fetch that from _source anyway, because IINM, we don't have a stored field for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we don't even need the aliases, don't we? 🤔 This should all work out via passthrough
.
Following could be sent via attributes and would be visible top level:
exception.type
exception.message
( above 2 are in SemConv - I don't think we even need to do anything with those here in the enrichment lib)exception.handled
: that is not part of OTel, se we need to set it here.
I opened a draft: elastic/elasticsearch#113636 I commented out all the aliases, the test is still green which queries exception.type
top level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregkalapos I think the alias should be for error.exception.{type, message}
-> exception.{type, message}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think we might need to put alias for stacktrace too: error.stack_trace
-> exception.stacktrace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see.
Ok, we can do error.exception.{type, message}
-> exception.{type, message}
; I'll update the PR.
Same for stack trace.
@@ -47,4 +47,15 @@ const ( | |||
AttributeSpanDestinationServiceResource = "span.destination.service.resource" | |||
AttributeSpanDurationUs = "span.duration.us" | |||
AttributeSpanRepresentativeCount = "span.representative_count" | |||
|
|||
// span event attributes | |||
AttributeParentID = "parent.id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should solve that by adding a parent.id
-> span_id
alias in the logs data stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid the "parent" in parent.id
is not the span, but the parent of span, i.e. parent.id != span_id.
I looked at the apm-data code, it appears that parent.id is really the span event's span's parent id. Not the span's id: https://github.com/elastic/apm-data/blob/cd0bb416903eebb5dc22e3eb6b6e4e9c75ad51fe/input/otlp/traces.go#L196
Parent id is passed onto the span event via cloning the span: https://github.com/elastic/apm-data/blob/cd0bb416903eebb5dc22e3eb6b6e4e9c75ad51fe/input/otlp/traces.go#L1072
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid the "parent" in parent.id is not the span, but the parent of span, i.e. parent.id != span_id.
I don't think this is true for errors. In case of errors, we specifically override the parent ID of the SpanEvent
to be the SpanID
(or `TransactionID for elastic transaction): https://github.com/elastic/apm-data/blob/bd5e34116889942c53b186deab0f76beaf182de0/input/otlp/traces.go#L1255-L1259
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wow, I missed that. Then is it a bug for logs from span events in apm-server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this fix in apm-data look right? elastic/apm-data#374
If so, yes, I think we can have parent.id
alias -> span.id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are discussing on slack the alternative to make the UI use span.id
directly: https://elastic.slack.com/archives/C95SB62AG/p1727693736192519?thread_ts=1727462154.365409&cid=C95SB62AG
// TODO(lahsivjar): Is error/log context handling be done in es-exporter? | ||
// Ref: https://github.com/elastic/apm-data/blob/59d33b8113d629f43dc82f9b3931ceda63a19a7a/input/otlp/traces.go#L1243-L1260 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that all properties that are already mapped as top level fields in the logs data stream should not be handled by enrichments but in the ES exporter.
case semconv.AttributeExceptionEscaped: | ||
s.exceptionEscaped = v.Bool() | ||
case semconv.AttributeExceptionType: | ||
s.exceptionType = v.Str() | ||
case semconv.AttributeExceptionMessage: | ||
s.exceptionMessage = v.Str() | ||
case semconv.AttributeExceptionStacktrace: | ||
s.exceptionStacktrace = v.Str() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can create alias fields for these?
Related to: #86