Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should StatusCanonicalCode be applied for rpc? #1044

Closed
eddynaka opened this issue Sep 30, 2020 · 15 comments · Fixed by #1156
Closed

should StatusCanonicalCode be applied for rpc? #1044

eddynaka opened this issue Sep 30, 2020 · 15 comments · Fixed by #1156
Assignees
Labels
area:error-reporting Related to error reporting area:semantic-conventions Related to semantic conventions priority:p2 Medium priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:trace Related to the specification/trace directory

Comments

@eddynaka
Copy link

What are you trying to achieve?

After the merge of this PR: #966, we can see that the StatusCanonicalCode got changed to Error, Unset, and Ok.

But, if we see the rpc status: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/rpc.md#status, they kept the same.

Additional context.

Should the status be the same? How should we map the rpc status to the new StatusCanonicalCode? For example: Unkown is an error or an unset status?

cc: @cijothomas @reyang

@eddynaka eddynaka added the spec:trace Related to the specification/trace directory label Sep 30, 2020
@cijothomas
Copy link
Member

@tedsuo

@arminru arminru added area:error-reporting Related to error reporting area:semantic-conventions Related to semantic conventions release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Oct 1, 2020
@arminru
Copy link
Member

arminru commented Oct 1, 2020

Good catch, this needs to be updated. Thanks @eddynaka!

@arminru
Copy link
Member

arminru commented Oct 9, 2020

In order to resolve this, we'll need to do two things:

  1. Add a mapping from GRPC Status Codes to our new OTel Status Code (Unset, Ok, Error), e.g.
  • OK --> Ok
  • CANCELLED and UNKNOWN --> undefined (up to the user) or leaving Unset
  • all the others --> Error
  1. Add a new attribute (e.g., rpc.grpc.status_code) to store the GRPC Status Code, which would otherwise be lost now that we have a different and considerably reduced set of Status Codes in OTel

@eddynaka
Copy link
Author

eddynaka commented Oct 9, 2020

Hi @arminru , thanks for mapping!

Do you have a timeline to be official in the spec?

@Oberon00
Copy link
Member

Oberon00 commented Oct 9, 2020

@arminru I think the mapping will have to match the mapping we have defined in the proto, see open-telemetry/opentelemetry-proto#224. And there AFAIK the current proposal is OK -> UNSET, everything else -> ERROR. EDIT: Oh, we have the reverse mapping there and it is not unambiguously reversible because both unset and ok map to OK. But we should be consistent, and should never set Ok from instrumentation.

@andrewhsu
Copy link
Member

@carlosalberto from the issue triage mtg today, assigning to you since you previously worked in this area

@eddynaka
Copy link
Author

eddynaka commented Oct 10, 2020

Just to get a clear response. Would that be right the mapping below?

For HttpStatusCode:

  • >= 100 && <= 399 -> UNSET
  • ERROR (otherwise)

For RPC:

  • OK -> UNSET
  • ERROR (otherwise)

@Oberon00
Copy link
Member

For HTTP, the mapping is already defined (indeed like you describe it) here: https://github.com/open-telemetry/opentelemetry-specification/blob/4b19e006ede63471103925edc6a15ab1d7579944/specification/trace/semantic_conventions/http.md#status

For RPC, I think that mapping makes sense, but we need to specify it in the semantic conventions. A PR would be welcome for that 😃

@arminru
Copy link
Member

arminru commented Oct 12, 2020

@Oberon00 oh yes of course. I forgot that we don't have the status source from the status OTEP in the API (yet?), which would make an instrumentation-determined Ok distinguishable. In this case, your and @eddynaka's proposal makes sense.

@eddynaka
Copy link
Author

In order to resolve this, we'll need to do two things:

  1. Add a mapping from GRPC Status Codes to our new OTel Status Code (Unset, Ok, Error), e.g.
  • OK --> Ok
  • CANCELLED and UNKNOWN --> undefined (up to the user) or leaving Unset
  • all the others --> Error
  1. Add a new attribute (e.g., rpc.grpc.status_code) to store the GRPC Status Code, which would otherwise be lost now that we have a different and considerably reduced set of Status Codes in OTel

Hi @arminru , for gRPC, we already have the grpc.status_code in our Activity (which is like Span). Can we maintain grpc.status_code instead of rpc.grpc.status_code? This would prevent us from removing one item in the list and re-adding the same item with different name.

@arminru
Copy link
Member

arminru commented Oct 13, 2020

@eddynaka

For the existing db.* and messaging.* attributes we use the respective db.system and messaging.system name for namespacing (e.g. db.mssql.instance_name, db.mongodb.collection and messaging.rabbitmq.routing_key) already.
In order to stay consistent here, the RPC semantic convention would need to use use rpc.grpc.* rather than introducing a new, top-level namespace for grpc.* next to rpc.*.
I can see, however, that the repetition of rpc looks a bit odd at first but it's simply the name of the RPC system/flavor being used and also the same with db.mongodb.*, for example.

See
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/database.md#call-level-attributes-for-specific-technologies
and
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/messaging.md#attributes-specific-to-certain-messaging-systems

@andrewhsu
Copy link
Member

from the spec issue triage mtg today, after discussing this issue sounds like this can be changed to allowed-for-ga since it is not a blocker for trace spec freeze

@andrewhsu andrewhsu added priority:p2 Medium priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs and removed priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Oct 16, 2020
@pcwiese
Copy link

pcwiese commented Oct 27, 2020

I don't understand how this cannot be a blocker for GA. The change in the status code spec stomped all over the status code emitted from RPC spans. Without defining a specific tag to carry that status code the receivers of the traces (collector + exporters) won't have any idea how to report the actual RPC status to backends. My team definitely relies on that breakdown of RPC failure codes.

I'm 100% in favor of passing this information along in an attribute with key "rpc.grpc.status_code". We need something here otherwise the collectors are going to start dropping a key piece of information when they start receiving Spans from SDKs that have implemented this new status spec.

@Oberon00
Copy link
Member

It is not a blocker for release since (1) OpenTelemetry as a whole is usable even without it and (2) it can be added after release as a non-breaking change, if it does not make it into the release.

@toumorokoshi
Copy link
Member

We just hit this too: open-telemetry/opentelemetry-python#1308. So filed a PR with the results of the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:error-reporting Related to error reporting area:semantic-conventions Related to semantic conventions priority:p2 Medium priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants