Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dashes to underscores normalization from http header attribute keys #369

Merged
merged 7 commits into from
Oct 20, 2023

Conversation

trask
Copy link
Member

@trask trask commented Oct 3, 2023

Fixes #304

Changes

Removes the dashes to underscores normalization from http header attribute keys.

Merge requirement checklist

@trask trask force-pushed the http-header-attr-keys branch from 41ad5b2 to be84f79 Compare October 3, 2023 22:43
@trask trask marked this pull request as ready for review October 3, 2023 22:43
@trask trask requested review from a team October 3, 2023 22:43
arminru
arminru previously requested changes Oct 4, 2023
Copy link
Member

@arminru arminru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same change also needs to be applied to response headers to stay consistent:

http.response.header.<key>
HTTP response headers, <key> being the normalized HTTP Header name (lowercase, with - characters replaced by _), the value being the header values. [5]
http.response.header.content_type=["application/json"]; http.response.header.my_custom_header=["abc", "def"]

docs/http/http-spans.md Outdated Show resolved Hide resolved
@pyohannes
Copy link
Contributor

The same change also needs to be applied to response headers to stay consistent:

Definitely, and in case we decide to do this for HTTP, we probably should also do it for RPC. Likely not in this PR, but at least note it in issues:
https://github.com/open-telemetry/semantic-conventions/blob/main/docs/rpc/connect-rpc.md?plain=1#L23
https://github.com/open-telemetry/semantic-conventions/blob/main/docs/rpc/grpc.md?plain=1#L22

@Oberon00
Copy link
Member

Oberon00 commented Oct 4, 2023

Likely not in this PR

I would say, please do it also in this PR. This should stay consistent. I'd rather leave HTTP unchanged than change only HTTP and leave RPC in an inconsistent state if we then in the rpc follow-up figure out the reasons for the normalization.

@arminru arminru dismissed their stale review October 5, 2023 14:38

RPC handling adapted to match HTTP in 16bc32a

@jsuereth
Copy link
Contributor

Based on our discussion in Semconv, I'm against this change.

I think we should either:

  • Not normalize at all, so conflicts are always obvious, this doesn't actually solve the underlying issue.
  • Normalize for consistency, including _. The notion of multiple headers and conflicts is something we have to resolve no mater what if we do any normalization of the raw data received.

@AlexanderWert
Copy link
Member

AlexanderWert commented Oct 10, 2023

Based on our discussion in Semconv, I'm against this change.

I think we should either:

  • Not normalize at all, so conflicts are always obvious, this doesn't actually solve the underlying issue.
  • Normalize for consistency, including _. The notion of multiple headers and conflicts is something we have to resolve no mater what if we do any normalization of the raw data received.

+1 (not blocking it from my side though, if we end up with lowercasing only)

I'm leaning towards:

  • Not normalize at all, so conflicts are always obvious, this doesn't actually solve the underlying issue.

Lowercasing would be for the sake of cardinality, right? But, all the attributes being touched in this PR are not part of metric dimensions, so cardinality should not be an issue.

Copy link
Member

@arminru arminru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving following the reasoning @trask provided in #369 (comment)

@trask
Copy link
Member Author

trask commented Oct 10, 2023

Lowercasing would be for the sake of cardinality, right?

Lowercasing is only for the keys (not the values, so not related to cardinality). If we don't lowercase the keys, then we end up with different keys, e.g. http.request.header.content-type and http.request.header.Content-Type.

@trask
Copy link
Member Author

trask commented Oct 10, 2023

I think we should either:

  • Not normalize at all, so conflicts are always obvious

I don't believe Content-Type and content-type are truly conflicts, so I don't think we need to make that difference obvious.

@pyohannes
Copy link
Contributor

Lowercasing would be for the sake of cardinality, right?

Lowercasing is only for the keys (not the values, so not related to cardinality). If we don't lowercase the keys, then we end up with different keys, e.g. http.request.header.content-type and http.request.header.Content-Type.

👍 Agreeing with @trask, lowercasing makes sense here because we're "converting" from a case-insensitive to a case-sensitive format.

It doesn't cause conflicts (because the origin format is case-insensitive), and I think it drastically improves the user experience. If I don't have that normalization and I'd want to filter for a certain content type, I'd theoretically need to look for all http.request.header.[cC][oO][nN][tT][eE][nN][tT]-[tT][yY][pP][eE] keys.

@Flarna
Copy link
Member

Flarna commented Oct 10, 2023

Most HTTP frameworks do the one or the other sort of normalizing of incoming headers because this eases the usage. This effects casing and concatenation/arrayfying. Simply because the spec is clear that casing doesn't matter and also it doesn't matter if headers which are allowed multiple times (e.g. tracestate) are sent as a single concat line or as array.

Similar on sending side HTTP frameworks might decide on such normalization.

In my opinion the main use of tracing is not to debug low level, framework specific low level details like the actual casing on the wire. It's more to monitor the HTTP request from functional point of view.
Allowing easy search/indexing/.. in backend by normalizing the input data without loosing functional details seems to be the better choice here.

If there is a use case to track also the low level bits and details (assuming the framework in use allows to extract this) I would add a separate set of attributes like http.request.raw_headers and http.response.raw_headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Normalizing HTTP header names is ambiguous