Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework gRPC status based on new rules #1308

Merged
merged 3 commits into from
Nov 2, 2020
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,9 @@ def set_trailing_metadata(self, *args, **kwargs):
def abort(self, code, details):
self.code = code
self.details = details
self._active_span.set_attribute("rpc.grpc.status_code", code.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change (i.e. OK instead of just 0), but I'm wondering if we should record the number code.value[0] as well in another attribute. I'm okay with not recording it, just noting that it was recorded before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's still a debate as to whether this should be the numeric code or the text version, the lean is numeric because of language differences. I implemented this before the last comment there, and so was going to hold off until that PR is accepted first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this attribute being added to the spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._active_span.set_status(
Status(status_code=StatusCode(code.value[0]), description=details)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some unit tests here? I'm a little surprised to see that there's no failing test for a significant change like make the code static.

In fact, it's a bit of a surprise that this wasn't caught when there's a range of invalid codes coming from the proto since the change. But either way, a unit test seems warranted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't caught because there aren't any real tests for error conditions in this instrumentation.

I wouldn't be opposed to working on that part also, but being thorough is going to be a bunch of work, and it seemed prudent to add a quick fix since the previous PR now violates the recently-changed spec.

Status(status_code=StatusCode.ERROR, description=details)
)
return self._servicer_context.abort(code, details)

Expand All @@ -125,18 +126,16 @@ def set_code(self, code):
self.code = code
# use details if we already have it, otherwise the status description
details = self.details or code.value[1]
self._active_span.set_attribute("rpc.grpc.status_code", code.name)
self._active_span.set_status(
Status(status_code=StatusCode(code.value[0]), description=details)
Status(status_code=StatusCode.ERROR, description=details)
)
return self._servicer_context.set_code(code)

def set_details(self, details):
self.details = details
self._active_span.set_status(
Status(
status_code=StatusCode(self.code.value[0]),
description=details,
)
Status(status_code=StatusCode.ERROR, description=details)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that everywhere we use set_details it's to add an error message so this change makes sense.

Although I feel like a good change (may or may not be for this PR) would be to have the method renamed to set_error_details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is a subclass of grpc.ServicerContext and so we should probably stay compliant with that API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's subclassing servicer context so can't be changed https://grpc.github.io/grpc/python/grpc.html#grpc.ServicerContext.set_details

)
return self._servicer_context.set_details(details)

Expand Down Expand Up @@ -189,6 +188,7 @@ def _start_span(self, handler_call_details, context):
attributes = {
"rpc.method": handler_call_details.method,
"rpc.system": "grpc",
"rpc.grpc.status_code": grpc.StatusCode.OK,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious as to why we set grpc.StatusCode.OK before the call has completed? Is the expectation that we assume it is good and if it does fail later then this status code will be replaced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't call abort() or otherwise set the status code, then it's not an error, and so OK is the default response.
I mean, as far as this interceptor can tell - we can't actually hook this at the true end of the call stack because interceptors don't really work that way.

}

metadata = dict(context.invocation_metadata())
Expand Down