-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework gRPC status based on new rules #1308
Conversation
As of open-telemetry#1214, the status codes changed and no longer line up with gRPC status codes, so now we'll just set `StatusCode.ERROR` and store the actual gRPC status code in the trace as `grpc.status_code`.
@codeboten this should take care of your concern mentioned on #1171 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I can be overriden here, but I think a unit test minimum is warranted.
self._active_span.set_status( | ||
Status(status_code=StatusCode(code.value[0]), description=details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some unit tests here? I'm a little surprised to see that there's no failing test for a significant change like make the code static.
In fact, it's a bit of a surprise that this wasn't caught when there's a range of invalid codes coming from the proto since the change. But either way, a unit test seems warranted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't caught because there aren't any real tests for error conditions in this instrumentation.
I wouldn't be opposed to working on that part also, but being thorough is going to be a bunch of work, and it seemed prudent to add a quick fix since the previous PR now violates the recently-changed spec.
@@ -125,18 +126,16 @@ def set_code(self, code): | |||
self.code = code | |||
# use details if we already have it, otherwise the status description | |||
details = self.details or code.value[1] | |||
self._active_span.set_attribute("rpc.status_code", code.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a semantic convention from the spec? I don't see an entry there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the looks of it, the value will probably be rpc.grpc.status_code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't either, but then, the spec you linked is now out of date, specifically this part:
Implementations MUST set status which MUST be the same as the gRPC client/server status. The mapping between gRPC canonical codes and OpenTelemetry status codes is 1:1 as OpenTelemetry canonical codes is just a snapshot of grpc codes which can be found here.
If there's a more appropriate way to provide this data in the trace, I'd be happy to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, see open-telemetry/opentelemetry-specification#1044 - I didn't realize this discussion was going on when I made this change.
I'd be happy to remove the extra attribute until that's resolved, but not having the status code in a trace makes the trace significantly less useful for real use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I complied with your change to that spec :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I was about to ask this 😄
Specifically this: open-telemetry/opentelemetry-specification#1156
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this quick change!
@@ -189,6 +188,7 @@ def _start_span(self, handler_call_details, context): | |||
attributes = { | |||
"rpc.method": handler_call_details.method, | |||
"rpc.system": "grpc", | |||
"rpc.grpc.status_code": grpc.StatusCode.OK, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious as to why we set grpc.StatusCode.OK
before the call has completed? Is the expectation that we assume it is good and if it does fail later then this status code will be replaced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't call abort()
or otherwise set the status code, then it's not an error, and so OK
is the default response.
I mean, as far as this interceptor can tell - we can't actually hook this at the true end of the call stack because interceptors don't really work that way.
status_code=StatusCode(self.code.value[0]), | ||
description=details, | ||
) | ||
Status(status_code=StatusCode.ERROR, description=details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that everywhere we use set_details
it's to add an error message so this change makes sense.
Although I feel like a good change (may or may not be for this PR) would be to have the method renamed to set_error_details
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is a subclass of grpc.ServicerContext
and so we should probably stay compliant with that API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's subclassing servicer context so can't be changed https://grpc.github.io/grpc/python/grpc.html#grpc.ServicerContext.set_details
@@ -113,8 +113,9 @@ def set_trailing_metadata(self, *args, **kwargs): | |||
def abort(self, code, details): | |||
self.code = code | |||
self.details = details | |||
self._active_span.set_attribute("rpc.grpc.status_code", code.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good change (i.e. OK
instead of just 0
), but I'm wondering if we should record the number code.value[0]
as well in another attribute. I'm okay with not recording it, just noting that it was recorded before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's still a debate as to whether this should be the numeric code or the text version, the lean is numeric because of language differences. I implemented this before the last comment there, and so was going to hold off until that PR is accepted first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this attribute being added to the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status_code=StatusCode(self.code.value[0]), | ||
description=details, | ||
) | ||
Status(status_code=StatusCode.ERROR, description=details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's subclassing servicer context so can't be changed https://grpc.github.io/grpc/python/grpc.html#grpc.ServicerContext.set_details
@@ -113,8 +113,9 @@ def set_trailing_metadata(self, *args, **kwargs): | |||
def abort(self, code, details): | |||
self.code = code | |||
self.details = details | |||
self._active_span.set_attribute("rpc.grpc.status_code", code.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this attribute being added to the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whelp, I forgot to submit pending review changes. not needed now, but I'll post them anyway.
@@ -125,18 +126,16 @@ def set_code(self, code): | |||
self.code = code | |||
# use details if we already have it, otherwise the status description | |||
details = self.details or code.value[1] | |||
self._active_span.set_attribute("rpc.status_code", code.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the looks of it, the value will probably be rpc.grpc.status_code
@@ -125,18 +126,16 @@ def set_code(self, code): | |||
self.code = code | |||
# use details if we already have it, otherwise the status description | |||
details = self.details or code.value[1] | |||
self._active_span.set_attribute("rpc.status_code", code.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for addressing changes.
Co-authored-by: Daniel Dyla <[email protected]>
* feat: graceful shutdown for tracing and metrics * fix: wording in test case * fix: typo * fix meterprovider config to use bracket notation Co-authored-by: Daniel Dyla <[email protected]> * fix meterprovider config to use bracket notation Co-authored-by: Daniel Dyla <[email protected]> * fix: add callbacks to shutdown methods * fix: merge conflict * simplify meter shutdown code Co-authored-by: Daniel Dyla <[email protected]> * fix: fix one-liner * private function name style fix Co-authored-by: Daniel Dyla <[email protected]> * fix: naming of private member variables * fix: graceful shutdown now works in browser * fix: window event listener will trigger once * fix: modify global shutdown helper functions * fix: remove callback from remove listener args * fix: change global shutdown function names and simplify functionality * fix: add rest of function refactoring and simplification * fix: remove unintended code snippet * fix: refactor naming of listener cleanup function and fix sandbox issue * fix: make global shutdown cleanup local * fix: change interval of MeterProvider collection to ensure it does not trigger through clock * chore: removing _cleanupGlobalShutdownListeners * fix: remove unnecesary trace provider member function * Removing default span attributes (open-telemetry#1342) * refactor(opentelemetry-tracing): removing default span attributes Signed-off-by: Aravin Sivakumar <[email protected]> * refactor(opentelemetry-tracing): removing default span attributed from tracer object Signed-off-by: Aravin Sivakumar <[email protected]> * refactor(opentelemetry-tracing): removing accidental add to package.json Signed-off-by: Aravin Sivakumar <[email protected]> * refactor(opentelemetry-tracing): removing redundant test and fixing suggestions by Shawn and Daniel Signed-off-by: Aravin Sivakumar <[email protected]> * feat: add baggage support to the opentracing shim (open-telemetry#918) Co-authored-by: Mayur Kale <[email protected]> * Add nodejs sdk package (open-telemetry#1187) Co-authored-by: Naseem <[email protected]> Co-authored-by: legendecas <[email protected]> Co-authored-by: Mark Wolff <[email protected]> Co-authored-by: Matthew Wear <[email protected]> * feat: add OTEL_LOG_LEVEL env var (open-telemetry#974) * Proto update to latest to support arrays and maps (#1339) * chore: 0.10.0 release proposal (open-telemetry#1345) * fix: add missing grpc-js index (open-telemetry#1358) * chore: 0.10.1 release proposal (open-telemetry#1359) * feat(api/context-base): change compile target to es5 (open-telemetry#1368) * Feat: Make ID generator configurable (#1331) Co-authored-by: Daniel Dyla <[email protected]> * fix: require grpc-js instead of grpc in grpc-js example (open-telemetry#1364) Co-authored-by: Bartlomiej Obecny <[email protected]> * chore(deps): update all non-major dependencies (open-telemetry#1371) * chore: bump metapackage dependencies (open-telemetry#1383) * chore: 0.10.2 proposal (open-telemetry#1382) * fix: remove unnecesary trace provider member function * refactor(metrics): distinguish different aggregator types (open-telemetry#1325) Co-authored-by: Daniel Dyla <[email protected]> * Propagate b3 parentspanid and debug flag (open-telemetry#1346) * feat: Export MinMaxLastSumCountAggregator metrics to the collector as Summary (open-telemetry#1320) Co-authored-by: Daniel Dyla <[email protected]> * feat: Collector Metric Exporter for the Web (open-telemetry#1308) Co-authored-by: Daniel Dyla <[email protected]> * Fix issues in TypeScript getting started example code (open-telemetry#1374) Co-authored-by: Daniel Dyla <[email protected]> * chore: deploy canary releases (open-telemetry#1384) * fix: protos pull * fix: address marius' feedback * chore: deleting removeAllListeners from prometheus, fixing tests, cleanu of events when using shutdown notifier * fix: add documentation and cleanup code * fix: remove async label from shutdown and cleanup test case * fix: update controller collect to return promise * fix: make downsides of disabling graceful shutdown more apparent Co-authored-by: Daniel Dyla <[email protected]> Co-authored-by: Bartlomiej Obecny <[email protected]> Co-authored-by: Aravin <[email protected]> Co-authored-by: Ruben Vargas Palma <[email protected]> Co-authored-by: Mayur Kale <[email protected]> Co-authored-by: Naseem <[email protected]> Co-authored-by: legendecas <[email protected]> Co-authored-by: Mark Wolff <[email protected]> Co-authored-by: Matthew Wear <[email protected]> Co-authored-by: Naseem <[email protected]> Co-authored-by: Mark Wolff <[email protected]> Co-authored-by: Cong Zou <[email protected]> Co-authored-by: Reginald McDonald <[email protected]> Co-authored-by: WhiteSource Renovate <[email protected]> Co-authored-by: srjames90 <[email protected]> Co-authored-by: David W <[email protected]> Co-authored-by: Mick Dekkers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing this!
Description
As of #1214, the status codes changed and no longer line up with gRPC status codes, so now we'll just set
StatusCode.ERROR
and store the actual gRPC status code in the trace asgrpc.status_code
.Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Mostly observation in Jaeger.
Checklist: