Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize the http.url, http.scheme, and http.target attributes to url.* (aligning with ECS) #3181

Closed
Tracked by #1012
trask opened this issue Feb 7, 2023 · 10 comments · Fixed by #3355
Closed
Tracked by #1012
Assignees
Labels
area:semantic-conventions Related to semantic conventions semconv:HTTP spec:trace Related to the specification/trace directory

Comments

@trask
Copy link
Member

trask commented Feb 7, 2023

What are you trying to achieve?

Generalize the http.url, http.scheme, and http.target attributes so they can be used by different semantic conventions.

See @reyang's #2489 (comment):

I will give an example which shows how we end up with if we call it http.url:

  1. A vendor is building an anti-virus software which records all the outgoing networking access, and for L7 actions, the software supports both FTP and HTTP.
  2. The vendor would want to have a table which looks like:
    | timestamp | user   | url.full              | blocked |
    | --------- | ------ | --------------------- | ------- |
    | T1        | reiley | http://wikipedia.org/ | no      |
    | T2        | reiley | http://microsoft.com/ | no      |
    | T3        | reiley | ftp://bad/malware     | yes     |
    
  3. If we ask folks to use http.url and ftp.url, they will either end up with a sparse table that is hard to use, or they have to put things to different tables, or they will push for a new major version of semantic convention.

Additional context.

ECS already has url.* namespace: https://www.elastic.co/guide/en/ecs/current/ecs-url.html

If we want to follow ECS naming, we could rename:

  • http.scheme -> url.scheme
  • http.url -> url.full
  • http.target -> split into url.path + url.query
@trask trask added area:semantic-conventions Related to semantic conventions spec:trace Related to the specification/trace directory semconv:HTTP labels Feb 7, 2023
@trask trask self-assigned this Feb 7, 2023
@trask trask moved this to Blocker for stability in Spec: HTTP Semantic Conventions Feb 7, 2023
@trask trask changed the title Generalize the http.url, http.scheme, and http.target attributes Generalize the http.url, http.scheme, and http.target attributes to url.* Feb 7, 2023
@tigrannajaryan
Copy link
Member

@open-telemetry/specs-approvers I know our conventions are not declared stable. However I think people built a lot of stuff that depends on these conventions already. Are we ready for such breaking changes? http conventions are pretty important for people, they are widely used, so I want to make sure we understand that this can result in a lot of backlash.

@arielvalentin
Copy link

@open-telemetry/specs-approvers I know our conventions are not declared stable. However I think people built a lot of stuff that depends on these conventions already. Are we ready for such breaking changes? http conventions are pretty important for people, they are widely used, so I want to make sure we understand that this can result in a lot of backlash.

+1 Changing the http.* will be quite disruptive for our teams at GitHub who have been building tooling around these attributes since adopting OTel.

@trask trask changed the title Generalize the http.url, http.scheme, and http.target attributes to url.* Generalize the http.url, http.scheme, and http.target attributes to url.* (and align with ECS) Feb 9, 2023
@trask trask changed the title Generalize the http.url, http.scheme, and http.target attributes to url.* (and align with ECS) Generalize the http.url, http.scheme, and http.target attributes to url.* (aligning with ECS) Feb 9, 2023
@trask trask moved this from Blocker for stability to Blocker for stability (ECS) in Spec: HTTP Semantic Conventions Feb 14, 2023
@MrAlias
Copy link
Contributor

MrAlias commented Feb 17, 2023

  • http.target -> split into url.path + url.query

This split seems needed. Currently, by including the query in an attribute value a malicious user can overload HTTP servers with attributes just by changing the query parameter.

@danielcompton
Copy link

While I understand it would be frustrating and might cause extra work, the HTTP semantic conventions are still marked as experimental. If people think that this change makes sense, I'd vote for making it while the conventions are still unstable. I don't think there would be much stopping people continuing with their existing http.* attributes if they want to keep them?

Equally, once the conventions are marked as stable, breaking changes should be taken very seriously.

@Oberon00
Copy link
Member

Oberon00 commented Feb 20, 2023

@MrAlias @reyang

Currently, by including the query in an attribute value a malicious user can overload HTTP servers with attributes just by changing the query parameter.

I don't understand what you mean, maybe you are talking about metric attributes? If so, this "attack" could probably also work by changing the URL path (since 404s are probably also included in metrics).

"http.target -> split into url.path + url.query"

Also note that the HTTP target https://httpwg.org/specs/rfc9110.html#target.resource is typically what is mentioned above, but can also theoretically be:

@Oberon00
Copy link
Member

Oberon00 commented Feb 20, 2023

If we ask folks to use http.url and ftp.url, they will either end up with a sparse table that is hard to use, or they have to put things to different tables

I think that example is a bit weak. They could still put everything into a single table and the software would check http.url or ftp.url, whichever is present.

It is also a bit questionable if we would even introduce ftp.url, or maybe use net.peer.name + ftp.filepath or something

That said, I think using url.full or maybe request.url or something else that can generically be used everywhere would also go into the same direction as the previous changes to the net conventions we did for protocols, and it would seem a better approach. But I also feel like it is not worth the disruption. At least we'd need to keep http.url et. al as deprecated attribute.

@jack-berg
Copy link
Member

I support this proposed change. Nesting all these under http was a mistake as url is more generally applicable than the http domain its currently nested under.

@Oberon00
Copy link
Member

So we really want to rename the probably most-used attribute of all semantic conventions?

@tigrannajaryan
Copy link
Member

Because of how widely http attributes are used I think we need to make an effort to reduce the impact if we decide to go ahead with this change.

For example we can set an "effective from version" for the change to give enough time for users and vendors to prepare for the upcoming changes. This probably requires changes to semconv generator tool in the spec repo and in all languages.

@trask
Copy link
Member Author

trask commented Apr 4, 2023

Because of how widely http attributes are used I think we need to make an effort to reduce the impact if we decide to go ahead with this change.

For example we can set an "effective from version" for the change to give enough time for users and vendors to prepare for the upcoming changes. This probably requires changes to semconv generator tool in the spec repo and in all languages.

@tigrannajaryan here’s an initial proposal for what “effective from version” could look like: #3362

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions semconv:HTTP spec:trace Related to the specification/trace directory
Projects
8 participants