-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] rack middleware close trace via body proxy (Rack::CommonLogger compatibility) #1746
base: master
Are you sure you want to change the base?
Conversation
5332b8d
to
14c7616
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 @skcc321, thank you for the work you've done in this PR.
I understand the issue at hand, which is that log records created by Rack::CommonLogger
won't receive trace->logs correlation, as ddtrace
's trace will have already been finished by the time Rack::CommonLogger
gets a chance to output its log line.
One concern I have with the solution proposed is that it create an unpredictable lifespan and "lifecycle end event" for the rack.request
span (the span created by Datadog::Contrib::Rack::TraceMiddleware
).
The way it works today, the rack.request
span is created at the Datadog Start
arrow in the diagram above, and finished at Datadog End
. The Rack::CommonLogger
log event will whenever a middleware chooses to close the response body, which is likely to happen below the Datadog End
in the diagram. It could happen at End of Middleware 2
, at End of Last Middleware
or at the web sever handler.
It could also conditionally happen in different points at different times (sometimes at Middleware 2, sometimes at Middleware 1).
The most concerning issue that this unpredictable rack.request
end event can cause is the mangling of Datadog spans in cases where "Middleware 1" or "Middleware 2" create Datadog spans. If, for example, "Middleware 2" creates a Datadog span, that span will be the parent of rack.request
, created at Datadog::Contrib::Rack::TraceMiddleware
. But if Rack::BodyProxy#close
is called at End of Last Middleware
, this will happen after the span created by "Middleware 2" was closed, meaning the parent span is closed before the child span rack.request
is closed. This produces confusing traces that have dubious value.
Because the predictable lifecycle of a span is very important to:
- Ensure a span measures the same operation every time, as to allow for time-based comparisons (has my end-point became slower recently?).
- Spans that are not closed leak memory: their memory will never be released by the host application.
For these reasons we have strong lifecycle constraints on span creation and closure.
For this reason, I don't think I'm comfortable with the solution presented as is.
This is not to say that the issue of lack of trace->logs correlation in Rack::CommonLogger
log events is not important: it's very much so. But the compromise, as it is implemented, introduces too much risk to the Rack instrumentation.
I'm open to other suggestions around propagating trace correlation in environments where the root span (rack.request
in this case) has already closed, but we'd like to still correlate closely related log lines to it.
@delner and @ivoanjo, could you guys take a look at the problem and solution at hand here and see if you have any options or suggestions for solving the Rack::CommonLogger
problem at hand?
And just for completeness, there's also the case where Rack::BodyProxy#close
is not called at all, but that contradicts Rack's spec and thus such application would not behave correctly in many different ways. I don't think this specific case is a concern.
Is there any way to push the You mentioned this is a Hanami app... one possibility: if necessary, we could introduce a Hanami integration that is aware of its middleware stack and can manage this. (Similar to what we do for Rails already.) Let me know if any of that would work, or if I'm missing some detail. |
@delner hanami is just a rack based framework. So the issue is related to Rack::CommonLogger which is something which exists many years. |
@marcotc so as I understand the solution for rack middleware is: If the Body responds to +close+, it will be called after iteration. If Doesn't that guarantee that body will be closed always? |
I haven't dug into this that often, so take my opinions/suggestions with a big grain of salt, but looking at But that gets me thinking; isn't what we're trying to do here similar to what the rack common logger is doing? E.g. could we gather the current span information inside |
@skcc321 Right, so why couldn't we push the |
@delner the reason is that |
14c7616
to
513c157
Compare
Codecov Report
@@ Coverage Diff @@
## master #1746 +/- ##
=======================================
Coverage 98.17% 98.17%
=======================================
Files 934 934
Lines 45048 45056 +8
=======================================
+ Hits 44227 44235 +8
Misses 821 821
Continue to review full report at Codecov.
|
Thanks for the ping, I'll raise this with the team! |
This is an alternate implementation of the Rack instrumentation that leverages the [`Rack::Events` API](https://www.rubydoc.info/gems/rack/Rack/Events) instead of a custom`Rack::Middleware`. Why am I suggesting we change this instrumentation? At GitHub we leverage `Rack::BodyProxy` to write Rack Request logs after the request is complete, however Rack span is already finished and its related `Context` has already been detached. This means we are not able to correlate request logs to our traces. The advantage of using `Rack::Events` is that handlers are triggered during difference stages of a request, including for deferred operations like [`Rack::BodyProxy`](https://www.rubydoc.info/gems/rack/Rack/BodyProxy), as opposed to middlewares who _only_ are invoked inline. The disadvantage of this API is that it makes managing the request more difficult and we have to track the Context positions to detach in the `Rack::Env`. This implementation will be released along side the existing instrumtation to give users the option to use the middleware instead of the `Rack::Events` handler until we are able to run this in some of our heavy production workloads. Fixes open-telemetry#341 Related DataDog/dd-trace-rb#1746
This is an alternate implementation of the Rack instrumentation that leverages the [`Rack::Events` API](https://www.rubydoc.info/gems/rack/Rack/Events) instead of a custom`Rack::Middleware`. Why am I suggesting we change this instrumentation? At GitHub we leverage `Rack::BodyProxy` to write Rack Request logs after the request is complete, however Rack span is already finished and its related `Context` has already been detached. This means we are not able to correlate request logs to our traces. The advantage of using `Rack::Events` is that handlers are triggered during difference stages of a request, including for deferred operations like [`Rack::BodyProxy`](https://www.rubydoc.info/gems/rack/Rack/BodyProxy), as opposed to middlewares who _only_ are invoked inline. The disadvantage of this API is that it makes managing the request more difficult and we have to track the Context positions to detach in the `Rack::Env`. This implementation will be released along side the existing instrumtation to give users the option to use the middleware instead of the `Rack::Events` handler until we are able to run this in some of our heavy production workloads. Fixes open-telemetry#341 Related DataDog/dd-trace-rb#1746
* feat: Use Rack::Events for instrumentation This is an alternate implementation of the Rack instrumentation that leverages the [`Rack::Events` API](https://www.rubydoc.info/gems/rack/Rack/Events) instead of a custom`Rack::Middleware`. Why am I suggesting we change this instrumentation? At GitHub we leverage `Rack::BodyProxy` to write Rack Request logs after the request is complete, however Rack span is already finished and its related `Context` has already been detached. This means we are not able to correlate request logs to our traces. The advantage of using `Rack::Events` is that handlers are triggered during difference stages of a request, including for deferred operations like [`Rack::BodyProxy`](https://www.rubydoc.info/gems/rack/Rack/BodyProxy), as opposed to middlewares who _only_ are invoked inline. The disadvantage of this API is that it makes managing the request more difficult and we have to track the Context positions to detach in the `Rack::Env`. This implementation will be released along side the existing instrumtation to give users the option to use the middleware instead of the `Rack::Events` handler until we are able to run this in some of our heavy production workloads. Fixes #341 Related DataDog/dd-trace-rb#1746 * squash: additional feature parity * squash: add allowed response headers * squash: url quantization * squash: Now with new Lemon Scented response headers * squash: we are now at parity * squash: use instrumetation config * squash: Use declarative config options * squash: fix bad refactoring * convert proxy span to an event * refactor: move configurations to instrumentation install * squash: add test converage * squash: make response headers a little more resilient * squash: Ensures event middleware will not cause the application to crash when it encounters errors * squash: fix linter error * feat: Add middleware args helper for ActionPack and Sinatra * fix: test case * fix: Rack Events is autoloaded so if the parent module is present so are submodules * fix: More precise error handling * fix: Ensure config is cleared/setup during installation * fix: Sinatra 1.4 compatability * fix: bad merge * fix: invalid responses in test case * squash: Added a few more test cases
What is the issue?
Hanami
framework is Rack-based framework. It usesRack::CommonLogger
middleware for printing out request/response logs. The issue is, current span/trace is not associated with these logs becauseDatadog rack middleware
closes trace too early. If we take a look at Rack::CommonLogger we can observe thatRack::CommonLogger
usesRack::BodyProxy
to guarantee that log is printed only when everything else is done (datadog trace as well) as a result we don't have anactive correlation
in logs formatter.So I implemented the same approach for closing current span/trace in
Datadog rack middleware
.