Semantic conventions for when to record exceptions #521

murphpdx · 2020-03-18T23:38:28Z

This PR is not meant to compete with the the following PR's:
#427
#432

This addition is only intended to spec out when to instrument errors. Java currently uses this behavior.

Oberon00 · 2020-03-19T10:25:29Z

specification/data-error-semantic-conversions.md


-An error event created by an API call SHOULD be associated with the current span from which the API call is made. 
+Errors and exceptions SHOULD be recorded any time an unhandled exception or error leaves the boundary of a span.


This can only happen one time per span though? Then we can use span attributes instead of events as currently suggested in most PRs. I would agree with that approach 👍 We can add semantic conventions for unhandled errors and handled errors separately though.

I am not arguing, but why storing attributes in a span is better than storing same attributes in an event? It looks the same for me except that

there can be multiple events (errors/logs)

event has a time

It is simpler/faster to process. If you want to know if your span has an error on the backend, you can check mySpan.hasAttribute('error.whatever') instead of any(e.name == 'error' for e in mySpan.events).
EDIT: On second thought, you could also have an hasEventNamed('error') method on your backend span, but that would still most likely have a more complex implementation than a hash map lookup.

While it's true we would not run into double errors when we auto instrument, a user can still call an API to record an error. Error attributes don't handle if the user calls the API twice on the same span or if they call it once when we auto instrumented an error on the same span.

I feel this tension very strongly; my personal preference has been one error recorded per span, but in go we went in the direction of having separate (potentially multiple) spanevents recorded, one per error. I still do feel that in general it is a useful optimization to flag the spans that have spanevent children indicating error, so that they're more easily searchable (rather than requiring parent HAVING child.error = true type search syntax)

I still do feel that in general it is a useful optimization to flag the spans that have spanevent children indicating error, so that they're more easily searchable (rather than requiring parent HAVING child.error = true type search syntax)

@lizthegrey Do you mean marking parents of a failed child as failed?
This won't be doable since a child can end (and thus fail) after the parent has ended in async scenarious and at that time the parent might have been exported already (or is at least no longer accessible to the instrumentation). This would be an optimization the backend would have to take care of when ingesting and analyzing the tree of spans.
Furthermore, a parent can also gracefully handle failures, e.g., by retrying the failed child operation or falling back (to alternative/cached data, for example), and thus succeed as a whole despite failed child spans.

Sure, it's possible that a parent span can succeed despite a child span failing; however, our current schema is using SpanEvents (not child Spans, and not storing the information on the span itself) to record error statuses. I guess that's what the overall status of the span in terms of gRPC code is for though.

vmihailenco · 2020-03-19T10:47:51Z

specification/data-error-semantic-conversions.md

+
+## Recording an Error
+
+Errors and exceptions SHOULD be recorded any time an unhandled exception or error leaves the boundary of a span.


What about nested spans and error propagating? E.g.

span1 span2 span3 <- error happens here and propagated up to span1

Should the same error be recorded 3 times in span3, span2, and span1? It seems wasteful, but it is also not clear how to prevent/deduplicate it...

Then the user will know if/where it was handled.

I think this is worth including in the PR if this is your intention.

But:

it is not true in my case - errors tend to be recorded several times, but propagation stops at instrumentation boundaries (it is fixable for sure), not when error is handled

it is expensive and slightly annoying - especially if/when user stores backtraces

E.g. in Go call depth of 10-20 frames is pretty common and errors are usually handled at the very top level - duplicating error information with backtraces 10-20 times is a no go...

In e.g. Java the following situation is quite common:

span3 throws an exception, let us call it "low level error". E.g. that database call has failed

span2 catches it and wraps it into another exception object. This can be some sort of "business process" exception, e.g. "failed to update inventory"

span1 catches that and wraps it again into yet another exception object. This time more user-oriented one, e.g. "checkout failed".

One can argue that all three exceptions should be recorded and propagated to the backend. And all three spans should be marked as failed.

dyladan · 2020-03-19T18:00:19Z

typo in title

iNikem · 2020-07-30T11:55:16Z

I think this PR is superseded by #697 and propose to close it. @open-telemetry/specs-approvers @open-telemetry/specs-trace-approvers @open-telemetry/technical-committee

Oberon00 · 2020-07-30T11:57:11Z

Actually, this is still something we have to spec: We now have the recordException API but no guidance on ~~how~~ when to record exceptions. It's more similar to my #747.

iNikem · 2020-07-30T12:15:09Z

@murphpdx Are you interested in refreshing this PR? There is now specification/trace/semantic_conventions/exceptions.md to hold this proposal. I think its content may remain the same.

Oberon00 · 2020-07-31T08:20:43Z

specification/data-error-semantic-conversions.md

+
+## Recording an Error
+
+Errors and exceptions SHOULD be recorded any time an unhandled exception or error leaves the boundary of a span.


While I agree mostly with this spec, I would prefer to also specify:

Errors and exceptions MUST NOT be recorded unless an unhandled exception or error leaves the boundary of a span.

If we want to support any such situations, we would need to add something to distinguish them from unhandled (in the above sense) execptions, probably a simplified variant of what is suggested in #747.

Oberon00 · 2020-07-31T08:23:36Z

I took the liberty to change the PR title. I hope it is more discoverable that way.

Oberon00

Approving this because I agree with the content (although editorial changes are required before merging this, namely moving the content to the existing recordException API and the exception semantic convention)

iNikem · 2020-07-31T08:45:13Z

specification/data-error-semantic-conversions.md

@@ -0,0 +1,10 @@
+# Semantic conventions for errors


For now we have recordException API, thus this should probably talk only about exceptions, and not errors.

If @murphpdx is no longer working on this PR, I could create an updated version.

murphpdx · 2020-08-05T18:50:03Z

Handing this off to John Watson.

jkwatson · 2020-08-05T19:14:25Z

Handing this off to John Watson.

who is handing it off to @Oberon00

linux-foundation-easycla · 2020-08-06T07:05:41Z

✅ Bogdan Drutu (c6ecad8)

❌ A Murphy The commit (cd3ce42 ,42721e6a8e7daf472b624c3e61fa93302b648e56) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

Oberon00 · 2020-08-06T07:12:07Z

Sorry, I accidentally pushed to this PR's branch which caused GH to re-request reviews in the name of @murphpdx it seems.

Oberon00 · 2020-08-06T07:13:50Z

What are we gonna do with the EasyCLA check anyway? If I base my PR on this one, the check will keep failing. I could remove the commit from @murphpdx and replace it with a Co-authored-by and hope that this is fine.

jkwatson · 2020-08-06T17:25:58Z

What are we gonna do with the EasyCLA check anyway? If I base my PR on this one, the check will keep failing. I could remove the commit from @murphpdx and replace it with a Co-authored-by and hope that this is fine.

I think Amanda would be fine with that.

tigrannajaryan · 2020-08-12T18:00:50Z

#697 is now merged. Do we still need this PR?

arminru · 2020-08-12T18:52:54Z

Superseded by #761. Thanks @murphpdx for the initial proposal!

When do you record an error?

cd3ce42

murphpdx requested review from arminru, bogdandrutu, c24t, carlosalberto, iredelmeier, jmacd, reyang, SergeyKanzhelev, tedsuo, tigrannajaryan and yurishkuro as code owners March 18, 2020 23:38

linter

42721e6

Oberon00 reviewed Mar 19, 2020

View reviewed changes

vmihailenco reviewed Mar 19, 2020

View reviewed changes

murphpdx changed the title ~~Semantic conversions for errors~~ Semantic conventions for errors Mar 19, 2020

Merge branch 'master' into error-recording

c6ecad8

pavolloffay mentioned this pull request Apr 16, 2020

Use Zipkin sanitizers jaegertracing/jaeger#2164

Closed

arminru mentioned this pull request May 12, 2020

Add proper error reporting before OpenTelemetry goes GA #599

Closed

arminru added the area:error-reporting Related to error reporting label May 12, 2020

carlosalberto added the area:semantic-conventions Related to semantic conventions label Jun 19, 2020

reyang added the release:required-for-ga Must be resolved before GA release, or nice to have before GA label Jul 6, 2020

Oberon00 reviewed Jul 31, 2020

View reviewed changes

Oberon00 mentioned this pull request Jul 31, 2020

Add exception.event event attribute. #747

Closed

Oberon00 changed the title ~~Semantic conventions for errors~~ Semantic conventions for when to record exceptions Jul 31, 2020

Oberon00 approved these changes Jul 31, 2020

View reviewed changes

iNikem reviewed Jul 31, 2020

View reviewed changes

Oberon00 self-assigned this Aug 6, 2020

murphpdx requested a review from a team August 6, 2020 07:05

Oberon00 force-pushed the error-recording branch from 41efc60 to c6ecad8 Compare August 6, 2020 07:06

murphpdx requested a review from a team August 6, 2020 07:06

Oberon00 mentioned this pull request Aug 6, 2020

Specify which exceptions SHOULD be recorded #761

Closed

tigrannajaryan mentioned this pull request Aug 12, 2020

Add semantic conventions for logs and errors #432

Closed

arminru closed this Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic conventions for when to record exceptions #521

Semantic conventions for when to record exceptions #521

murphpdx commented Mar 18, 2020

Oberon00 Mar 19, 2020

vmihailenco Mar 19, 2020

Oberon00 Mar 19, 2020 •

edited

Loading

murphpdx Mar 19, 2020 •

edited

Loading

lizthegrey Apr 28, 2020

arminru May 5, 2020

lizthegrey May 5, 2020

vmihailenco Mar 19, 2020 •

edited

Loading

murphpdx Mar 19, 2020

vmihailenco Mar 25, 2020

iNikem May 18, 2020

dyladan commented Mar 19, 2020

iNikem commented Jul 30, 2020

Oberon00 commented Jul 30, 2020 •

edited

Loading

iNikem commented Jul 30, 2020

Oberon00 Jul 31, 2020

Oberon00 commented Jul 31, 2020

Oberon00 left a comment

iNikem Jul 31, 2020

Oberon00 Aug 5, 2020 •

edited

Loading

murphpdx commented Aug 5, 2020

jkwatson commented Aug 5, 2020

linux-foundation-easycla bot commented Aug 6, 2020 •

edited

Loading

Oberon00 commented Aug 6, 2020

Oberon00 commented Aug 6, 2020

jkwatson commented Aug 6, 2020

tigrannajaryan commented Aug 12, 2020

arminru commented Aug 12, 2020


		An error event created by an API call SHOULD be associated with the current span from which the API call is made.
		Errors and exceptions SHOULD be recorded any time an unhandled exception or error leaves the boundary of a span.


		## Recording an Error

		Errors and exceptions SHOULD be recorded any time an unhandled exception or error leaves the boundary of a span.

Semantic conventions for when to record exceptions #521

Semantic conventions for when to record exceptions #521

Conversation

murphpdx commented Mar 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 Mar 19, 2020 • edited Loading

Choose a reason for hiding this comment

murphpdx Mar 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmihailenco Mar 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dyladan commented Mar 19, 2020

iNikem commented Jul 30, 2020

Oberon00 commented Jul 30, 2020 • edited Loading

iNikem commented Jul 30, 2020

Choose a reason for hiding this comment

Oberon00 commented Jul 31, 2020

Oberon00 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Oberon00 Aug 5, 2020 • edited Loading

Choose a reason for hiding this comment

murphpdx commented Aug 5, 2020

jkwatson commented Aug 5, 2020

linux-foundation-easycla bot commented Aug 6, 2020 • edited Loading

Oberon00 commented Aug 6, 2020

Oberon00 commented Aug 6, 2020

jkwatson commented Aug 6, 2020

tigrannajaryan commented Aug 12, 2020

arminru commented Aug 12, 2020

Oberon00 Mar 19, 2020 •

edited

Loading

murphpdx Mar 19, 2020 •

edited

Loading

vmihailenco Mar 19, 2020 •

edited

Loading

Oberon00 commented Jul 30, 2020 •

edited

Loading

Oberon00 Aug 5, 2020 •

edited

Loading

linux-foundation-easycla bot commented Aug 6, 2020 •

edited

Loading