-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only count time spent in queue as part of the http_server.queue
span
#2591
Only count time spent in queue as part of the http_server.queue
span
#2591
Conversation
cc5e411
to
b9f3860
Compare
15b548b
to
55f6fb8
Compare
Hey @agrobbin! Thanks for sharing this! Generally speaking I really like the idea here. The trace it produces seems more realistic, and it addresses a practical issue. The only a few (but not insurmountable) reservations that I have are:
My overall disposition is that I'm inclined to adopt this new behavior; it's just a matter of addressing the logistics stated above. We'll do our part to start a discussion internally, but I'm interested in hearing your thoughts to those points. |
Thanks for the thoughts @delner! I'm excited at the prospect of getting this included! I've responded to each of your questions below.
I definitely had a similar concern originally, but what led me here is that we need some root span, and conceptually, the overarching HTTP request is that "root thing". I didn't consider it totally surprising to be within the same service as the existing
I completely agree this is breaking, and I imagine this is something currently depended on by a bunch of users. I do lean toward a configuration option, but maybe
I'll defer to you on this for sure, but if there is anything I can do in terms of experimentation or testing here, let me know and I can set that up within my company's non-production systems. |
I think the model proposed is reasonable. I just need to reconcile with our plans for consistency with other Datadog instrumentation in other Datadog tracers.
Maybe slightly different names, but I think this is a clever solution. Satisfies my concerns. Probably the way to go! Give me a little time to try to shake the tree for some internal feedback, but if I don't get back to you quickly, please feel free to check back in and give me a nudge. In the case of non-decision, I think we adopt this under feature flag (non-default behavior). In the interim, if you can update the PR with the above configuration behavior (and tests/documentation to match) then that will help expedite things. |
1a72d48
to
1e9f26c
Compare
@delner I think I've got tests passing (excluding a seemingly unrelated failure on Ruby 2.7) with the new configuration option functionality. I went with I look forward to hearing what you find on your internal-to-DataDog side of things! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good thus far. Added some suggestions for documentation and tagging these spans.
@@ -11,7 +11,7 @@ | |||
if Datadog::DemoEnv.feature?('tracing') | |||
c.tracing.analytics.enabled = true if Datadog::DemoEnv.feature?('analytics') | |||
|
|||
c.tracing.instrument :rails, request_queuing: true | |||
c.tracing.instrument :rails, request_queuing: :exclude_request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be the default for the Rails integration app? Somewhat indifferent.
Need to think more about what these settings mean to an integration app: its not really meant to test all the different flags, as much as it is meant to test holistic stability of Rails 7 running typical ddtrace
deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll happily defer to you here @delner, I did a grep for request_queuing
to make sure I didn't miss anything, but will adjust as you see fit.
So far it doesn't look like we have a convention for HTTP queue spans. The existing convention defines
However, considering that we're trying to limit breaking changes (even under this feature flag), perhaps we can defer the rename of the Rack span until the next major version (2.0). This would give us:
Will double check internally to see if there's any objections. Let me know what you think as well. |
@delner just to be clear, renaming the existing |
Breaking change renaming span duly noted. If it doesn't complicate things too much, then ideally If possible, I would add comments (such as a |
7731de5
to
2998aa8
Compare
@delner I just updated this based on the above discussion! 1 last question... would you like there to a user-facing deprecation warning (via |
Not something that we've really put into practice but seems reasonable. Only case where I see this problematic is when an application is reconfigured more than once... might spam more than once then. Still, worth trying I think. |
We actually have a helper to solve exactly that issue! Example of it being used in such a manner: dd-trace-rb/lib/datadog/core/metrics/client.rb Lines 185 to 194 in eadd5e1
|
4332705
to
f0d8a19
Compare
@delner @ivoanjo I just added a deprecation warning for |
@ivoanjo Right, forgot about Yeah, |
@agrobbin I finished my internal discussion; sounds like we're good to go on this. Only caveat is that we want to mark this as "experimental" for the time being. Internally, we're interested in creating consistency across HTTP instrumentation as much as possible, and we may need to rename these spans in the future to reflect that consistent naming. Might happen, might never happen. But the trace structure itself is something we intend to keep for the foreseeable future, so nothing to worry about there. Can you make a note in the documentation (under the "request queuing" section) that this is "experimental"? Once that's in place, I'd be very happy to submit my approval on this. :) |
@delner if this is experimental, should the user-facing deprecation be removed? It seems a little confusing to end-users to introduce an experimental feature that simultaneously deprecates the old behavior. |
@agrobbin Yeah, I think that makes sense, given this change in direction. We can always re-add the deprecation later if we know its what we will break in a future major version. Sorry for the confusion! |
f0d8a19
to
27af4a4
Compare
No need to apologize! I've just made the updates. Let me know if there is anything else you see that you'd like to change. |
@agrobbin, it all looks great, thank you so much! I only left one small comment; save that this PR is ready to go. |
27af4a4
to
0a57883
Compare
@marcotc updated to mark the |
@agrobbin, that looks good, could I trouble you to add a simple test asserting that the span is marked as "measured"? I promise that's the last of it! :) |
If you have a request that spends 10ms in queue, and 5s being executed, you currently end up with these 2 spans in APM: * `http_server.queue` (duration: 5.01s) * `rack.request` (duration: 5s) However, that isn't really semantically correct. The queuing itself did not take 5.01s, the entire lifecycle of the request took 5.01s, with queueing making up 10ms of that time. This is an attempt at resolving that inconsistency by introducing an outer `http.proxy.request` span, which contains `http.proxy.queue` and `rack.request` as sibling spans, rather than the current parent/child relationship that exists between the 2 of them. It has been introduced as an opt-in feature, by setting `request_queueing: :exclude_request`.
0a57883
to
2e27f76
Compare
@marcotc whoops, sorry about that! I could've sworn I included one. Should be all set now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much, @agrobbin!
👋 @agrobbin , we are on the verge of release 2.0 ! This release we have made this as the default behavior when enabling gem 'datadog', '>= 2.0.0.rc1' |
That's great! We'll try and test it before you get to the official v2.0 release. |
What does this PR do?
This is an attempt at resolving the inconsistency raised in #2064 between
http_server.queue
's duration and real world server queuing time by introducing an outerhttp_server.request
span, which containshttp_server.queue
andrack.request
as sibling spans, rather than the current parent/child relationship that exists between the 2 of them.Motivation
If you have a request that spends 10ms in queue, and 5s being executed, you currently end up with these 2 spans in APM:
http_server.queue
(duration: 5.01s)rack.request
(duration: 5s)However, that isn't really semantically correct. The queuing itself did not take 5.01s, the entire lifecycle of the request took 5.01s, with queueing making up 10ms of that time.
After this change, you should end up with something like this:
http_server.request
(duration: 5.01s)http_server.queue
(duration: 10ms)rack.request
(duration: 5s)I'd love to get some general feedback on whether this kind of change is reasonable, as it is a breaking change (the definition of
http_server.queue
would change for anyone who depends on it). I feel that it is correcting a misleading and confusing APM span, but I'm not the one who maintains this library, or fields inbound customer support requests!