Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add http to span context properties #777

Closed
hmdhk opened this issue Mar 27, 2018 · 31 comments
Closed

Add http to span context properties #777

hmdhk opened this issue Mar 27, 2018 · 31 comments
Assignees
Labels
Milestone

Comments

@hmdhk
Copy link
Contributor

hmdhk commented Mar 27, 2018

For certain spans the frontend agent sends the raw url under the span context with the following structure:

{
  "name": "...",
  "context": {
    "http": {
      "url": {
        "raw": "http://example.com"
      }
    }
  }
}

We need to add this to the span context, however the raw url can be longer than our default 1024 string length limit. There are already cases that exceed this limit so we should not limit this field.

Would be great to hear your thoughts on this, @elastic/apm-server , @elastic/apm-agent-devs

cc @roncohen , @alvarolobato

Update: Decided to use the following structure:

{
  "name": "...",
  "context": {
    "http": {
      "url": "http://example.com"
    }
  }
}
@simitt
Copy link
Contributor

simitt commented Mar 27, 2018

FYI: We are limiting fields when they are indexed as an Elasticsearch keyword, because the default value for ignore_above is set to 1024 in beats.

@simitt
Copy link
Contributor

simitt commented Mar 27, 2018

Afaik this is hardcoded in beats atm.

@hmdhk
Copy link
Contributor Author

hmdhk commented Mar 28, 2018

Thanks @simitt, I don't think http.url.raw should be a keyword, so as you said, we don't need to limit it's length. But we still need to define this field in the spec as string field.

@simitt
Copy link
Contributor

simitt commented Mar 28, 2018

Are you generally referring to what is currently nested as context.request.url.raw or do you want to introduce a new context.http.url.raw. If you want to introduce a new field here, could you please motivate why.

Are you suggesting to index http.url.raw as a text instead of a keyword datatype or are you suggesting to not index it at all (not make it query nor searchable)?

@hmdhk
Copy link
Contributor Author

hmdhk commented Mar 28, 2018

@simitt, This is the span context not the transaction context which has context.request.url.raw. I'm proposing to add context.http.url.raw to the span context. The reason for this is that in order to keep the span name short, we need to truncate the url that was requested, but it is still useful to have the raw url. Furthermore we will add more fields to context.http.url object, for example the parsed query string or the url fragment.

context.http.url.raw does not need to be searchable but it needs to show up in the discover and also be included in the Kibana query results so that the UI can show this field.

@simitt
Copy link
Contributor

simitt commented Mar 28, 2018

Thanks for the clarification. If we don't need to index the field, I don't see an issue with not specifying a string length limit.

cc @elastic/apm-ui

@hmdhk
Copy link
Contributor Author

hmdhk commented Mar 28, 2018

I should also clarify that even though I am taking the url structure from the Transaction.context.request.url these http requests are different. Transaction.context.request.url is for the incoming request to the server, but Span.context.http.url (my proposal) is the outgoing request that browser makes to the server.

@felixbarny
Copy link
Member

I guess this would also apply for outgoing http requests. @beniwohli where are you currently setting the url of external http calls?

@alvarolobato
Copy link

@elastic/apm-server are we ok with this proposal? Should we work more on this?

@simitt
Copy link
Contributor

simitt commented Apr 10, 2018

If the agents can align on this structure and define all attributes they'd like to have introduced I don't see an issue in implementing this.

@felixbarny
Copy link
Member

@hamid have you thought about just having the same structure as in the transaction i.e. Span.context.request? That would bring spans and transactions closer together instead of making them diverge more.

Especially in the context of distributed tracing, it makes sense to have a more unified view of transactions and spans.

I'm in line with you that http would actually be nicer that requests but this is a different issue which is already being discussed in #794.

@alvarolobato
Copy link

@elastic/apm-agent-devs @roncohen please lets try make a decision about this, it has been waiting for some time.

@hmdhk
Copy link
Contributor Author

hmdhk commented May 9, 2018

We decided on the agent sync meeting to align with transaction request context

{
  "name": "...",
  "context": {
    "request": {
      "url": {
        "raw": "http://example.com"
      }
    }
  }
}

@beniwohli
Copy link
Contributor

@jahtalab raw in the transaction request context has a somewhat special meaning ("The raw, unparsed URL of the HTTP request line"). full is probably the better field to go with if we really want to align with the transaction request context.

I watched the meeting recording, and one point that didn't come up (AFAIR) was the option to align with opentracing. Our span contexts are so far relatively undefined (except for db), so at this point we could still adopt the opentracing span context naming conventions without too much head aches

https://github.com/opentracing/specification/blob/master/semantic_conventions.md

@hmdhk
Copy link
Contributor Author

hmdhk commented May 14, 2018

@beniwohli , That's only a sample payload the point is that we will define it's spec like it's defined here (possibly even use a reference to it) therefore every field is available to use (including raw and full) depending on the use case.

Regarding the OT, again I think the main point of the meeting (or at least the consensus) was to align with the transaction context so it follows that we will adopt the OT naming conventions once we decide to adopt it for transaction context.

@alvarolobato alvarolobato added this to the 6.4 milestone May 23, 2018
@alvarolobato
Copy link

@beniwohli @jahtalab can you talk about this offline and propose what to do?

@hmdhk
Copy link
Contributor Author

hmdhk commented May 23, 2018

@beniwohli, Please let me know if my last comment is not convincing and you would like to discuss things further.

@beniwohli
Copy link
Contributor

I still think that re-using the full blown request object from the transaction context for span contexts is not the best idea, since they are two very different things, but if every body else agrees, I won't stop you. We need to move forward with this either way.

@hmdhk
Copy link
Contributor Author

hmdhk commented May 24, 2018

Thanks @beniwohli , As you saw in the meeting recording we mostly agreed on aligning with transaction request, I also think there are issues with this approach but we need to move forward and reiterate later.

@alvarolobato , We can move forward with this.

@watson
Copy link
Contributor

watson commented May 24, 2018

As Beni said, I think the right property to use under url is full - not raw.

If I could I would love to refactor the request and response objects on the transaction, but that is a separate discussion and probably can't be done until v7.0 for backwards compatible reasons. So the question we need to ask our selfs here is if there's a benefit of aligning with the current structure of request and response as they look now, or if we should take the opportunity to refactor the objects on the span as we would eventually like them to look on transaction also.

@watson
Copy link
Contributor

watson commented May 24, 2018

Update: I just had a chat with Hamid who told me that urls from the browser can be relative, e.g. ../foo, /foo/bar etc. I didn't think of that when deciding between full vs raw. In that case, I think it makes a lot of sense for the browser agent to use raw. There's still value in trying to expand the relative URL to a full URL and also populate the full field - but that should be in addition to populating the raw field.

In the case of outgoing requests made by backend service, where we presumably always would have the full URL available, I think it should be fine for the backend agents to just use the full property and not worry about raw.

Also, to clarify my last paragraph in my previous comment: I'm 100% fine with aligning with the current structure we have for requests in the transaction context. I just wanted to make a note about that this structure wasn't necessarily ideal - both for transactions and spans. But there's definitely value in aligning instead of starting an incremental change. We can always do a big sweeping change in v7.0 if we feel that it's needed.

@beniwohli
Copy link
Contributor

What's the benefit of storing both the relative and the full URL? I think that's an important question for every single field we store in spans, as we already gobble up storage space pretty heavily. Can't we always figure out the full URL when the request is made?

Also, raw has a very special meaning in the backend, it's the first line of the HTTP request. Reusing the same field name for something else will lead to confusion.

@hmdhk
Copy link
Contributor Author

hmdhk commented May 29, 2018

Talked to @beniwohli IRL and decided to bring up this issue tomorrow at Agent sync meeting. The remaining topics:

  • Whether we should align with transaction request context or not
  • Using url.full vs using url.raw (using full means we should change it's definition to accept urls not assembled by the agent and also accepting both relative and absolute urls)

@hmdhk
Copy link
Contributor Author

hmdhk commented May 30, 2018

Decided to go with @beniwohli's suggestion. I have updated the description.

@roncohen
Copy link
Contributor

roncohen commented Jun 7, 2018

@jahtalab the update description indicates that we will not align with the transaction context as previously decided (http vs. request). Is that on purpose?

@hmdhk
Copy link
Contributor Author

hmdhk commented Jun 7, 2018

@roncohen , Yes that's correct. See previous comments.
Also you can find the discussion around it in our agent weekly sync meeting.

@simitt
Copy link
Contributor

simitt commented Jun 27, 2018

@roncohen @jahtalab please ping us when this is unblocked and ready for implementation.

@roncohen
Copy link
Contributor

OK. Let's more forward with http.url 👍

simitt added a commit to simitt/apm-server that referenced this issue Jun 27, 2018
simitt added a commit to simitt/apm-server that referenced this issue Jun 27, 2018
simitt added a commit to simitt/apm-server that referenced this issue Jun 27, 2018
simitt added a commit to simitt/apm-server that referenced this issue Jun 28, 2018
simitt added a commit that referenced this issue Jun 28, 2018
* Add context.http.url to spans.

implements #777
@simitt simitt closed this as completed Jun 28, 2018
simitt added a commit to simitt/apm-server that referenced this issue Jul 4, 2018
* Add context.http.url to spans.

implements elastic#777
simitt added a commit to simitt/apm-server that referenced this issue Jul 4, 2018
* Add context.http.url to spans.

implements elastic#777
simitt added a commit that referenced this issue Jul 4, 2018
* Add context.http.url to spans.

implements #777
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants