[fix] rack middleware close trace via body proxy (Rack::CommonLogger compatibility) #1746

skcc321 · 2021-11-01T11:38:41Z

What is the issue?

Hanami framework is Rack-based framework. It uses Rack::CommonLogger middleware for printing out request/response logs. The issue is, current span/trace is not associated with these logs because Datadog rack middleware closes trace too early. If we take a look at Rack::CommonLogger we can observe that Rack::CommonLogger uses Rack::BodyProxy to guarantee that log is printed only when everything else is done (datadog trace as well) as a result we don't have an active correlation in logs formatter.
So I implemented the same approach for closing current span/trace in Datadog rack middleware.

marcotc

👋 @skcc321, thank you for the work you've done in this PR.

I understand the issue at hand, which is that log records created by Rack::CommonLogger won't receive trace->logs correlation, as ddtrace's trace will have already been finished by the time Rack::CommonLogger gets a chance to output its log line.

One concern I have with the solution proposed is that it create an unpredictable lifespan and "lifecycle end event" for the rack.request span (the span created by Datadog::Contrib::Rack::TraceMiddleware).

The way it works today, the rack.request span is created at the Datadog Start arrow in the diagram above, and finished at Datadog End. The Rack::CommonLogger log event will whenever a middleware chooses to close the response body, which is likely to happen below the Datadog End in the diagram. It could happen at End of Middleware 2, at End of Last Middleware or at the web sever handler.
It could also conditionally happen in different points at different times (sometimes at Middleware 2, sometimes at Middleware 1).

The most concerning issue that this unpredictable rack.request end event can cause is the mangling of Datadog spans in cases where "Middleware 1" or "Middleware 2" create Datadog spans. If, for example, "Middleware 2" creates a Datadog span, that span will be the parent of rack.request, created at Datadog::Contrib::Rack::TraceMiddleware. But if Rack::BodyProxy#close is called at End of Last Middleware, this will happen after the span created by "Middleware 2" was closed, meaning the parent span is closed before the child span rack.request is closed. This produces confusing traces that have dubious value.

Because the predictable lifecycle of a span is very important to:

Ensure a span measures the same operation every time, as to allow for time-based comparisons (has my end-point became slower recently?).
Spans that are not closed leak memory: their memory will never be released by the host application.
For these reasons we have strong lifecycle constraints on span creation and closure.

For this reason, I don't think I'm comfortable with the solution presented as is.

This is not to say that the issue of lack of trace->logs correlation in Rack::CommonLogger log events is not important: it's very much so. But the compromise, as it is implemented, introduces too much risk to the Rack instrumentation.

I'm open to other suggestions around propagating trace correlation in environments where the root span (rack.request in this case) has already closed, but we'd like to still correlate closely related log lines to it.

@delner and @ivoanjo, could you guys take a look at the problem and solution at hand here and see if you have any options or suggestions for solving the Rack::CommonLogger problem at hand?

And just for completeness, there's also the case where Rack::BodyProxy#close is not called at all, but that contradicts Rack's spec and thus such application would not behave correctly in many different ways. I don't think this specific case is a concern.

delner · 2021-11-02T04:56:05Z

Is there any way to push the Datadog middleware to the top of the stack? Or does BodyProxy truly have to run first/last? (Before/after ddtrace middleware?) If there's a way to do this, I think that would be the simplest solution.

You mentioned this is a Hanami app... one possibility: if necessary, we could introduce a Hanami integration that is aware of its middleware stack and can manage this. (Similar to what we do for Rails already.)

Let me know if any of that would work, or if I'm missing some detail.

skcc321 · 2021-11-02T08:15:11Z

@delner hanami is just a rack based framework. So the issue is related to Rack::CommonLogger which is something which exists many years.
Rack::CommonLogger is a part of rack library so technically datadog "rack" instrumentation is not complete without handling that CommonLogger properly.

skcc321 · 2021-11-02T08:26:03Z

@marcotc so as I understand the solution for rack middleware is:
Insert datadog rack middleware on top of all middlewares, so trace/span is started for sure at the very beginning of a request processing and closed at the very end. So according to behaviour of Rack::BodyProxy I think the implementation is OK (if we are sure that datadog rack middleware was inserted at the very beginning)

If the Body responds to +close+, it will be called after iteration. If
the original Body is replaced by a new Body, the new Body
must close the original Body after iteration, if it responds to +close+.
If the Body responds to both +to_ary+ and +close+, its
implementation of +to_ary+ must call +close+ after iteration.

Doesn't that guarantee that body will be closed always?

ivoanjo · 2021-11-02T09:07:48Z

could you guys take a look at the problem and solution at hand here and see if you have any options or suggestions for solving the Rack::CommonLogger problem at hand?

I haven't dug into this that often, so take my opinions/suggestions with a big grain of salt, but looking at Rack::CommonLogger it seems to gather a number of infos from the app when it executes as part of the rack stack, and then uses the BodyProxy as a way of printing that information some time later in the future:

https://github.com/rack/rack/blob/e0993d47096d36535dbe725091963a7e0f5aab52/lib/rack/common_logger.rb#L36-L42

But that gets me thinking; isn't what we're trying to do here similar to what the rack common logger is doing? E.g. could we gather the current span information inside CommonLogger#call, and store it for later, so that the logger could access it after-the-fact?

delner · 2021-11-02T21:41:43Z

so technically datadog "rack" instrumentation is not complete without handling that CommonLogger properly.

@skcc321 Right, so why couldn't we push the datadog middleware to the the last middleware?

skcc321 · 2021-11-03T10:22:16Z

so technically datadog "rack" instrumentation is not complete without handling that CommonLogger properly.

@skcc321 Right, so why couldn't we push the datadog middleware to the the last middleware?

@delner the reason is that Rack::BodyProxy will write logs as a "callback" so it always goes last regardless Datadog middleware on top of other middlewares. As a result, you have no active span/trace in Rack::CommonLogger#call

codecov-commenter · 2021-11-19T12:05:38Z

Codecov Report

Merging #1746 (513c157) into master (8b97105) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1746   +/-   ##
=======================================
  Coverage   98.17%   98.17%           
=======================================
  Files         934      934           
  Lines       45048    45056    +8     
=======================================
+ Hits        44227    44235    +8     
  Misses        821      821

Impacted Files	Coverage Δ
lib/ddtrace/contrib/delayed_job/plugin.rb	`100.00% <100.00%> (ø)`
lib/ddtrace/contrib/rack/middlewares.rb	`85.71% <100.00%> (+0.89%)`	⬆️
lib/ddtrace/contrib/rack/patcher.rb	`93.75% <100.00%> (+0.13%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b97105...513c157. Read the comment docs.

skcc321 · 2021-11-19T12:08:52Z

@delner @ivoanjo any updates on this one?

ivoanjo · 2021-11-19T12:33:02Z

Thanks for the ping, I'll raise this with the team!

This is an alternate implementation of the Rack instrumentation that leverages the [`Rack::Events` API](https://www.rubydoc.info/gems/rack/Rack/Events) instead of a custom`Rack::Middleware`. Why am I suggesting we change this instrumentation? At GitHub we leverage `Rack::BodyProxy` to write Rack Request logs after the request is complete, however Rack span is already finished and its related `Context` has already been detached. This means we are not able to correlate request logs to our traces. The advantage of using `Rack::Events` is that handlers are triggered during difference stages of a request, including for deferred operations like [`Rack::BodyProxy`](https://www.rubydoc.info/gems/rack/Rack/BodyProxy), as opposed to middlewares who _only_ are invoked inline. The disadvantage of this API is that it makes managing the request more difficult and we have to track the Context positions to detach in the `Rack::Env`. This implementation will be released along side the existing instrumtation to give users the option to use the middleware instead of the `Rack::Events` handler until we are able to run this in some of our heavy production workloads. Fixes open-telemetry#341 Related DataDog/dd-trace-rb#1746

* feat: Use Rack::Events for instrumentation This is an alternate implementation of the Rack instrumentation that leverages the [`Rack::Events` API](https://www.rubydoc.info/gems/rack/Rack/Events) instead of a custom`Rack::Middleware`. Why am I suggesting we change this instrumentation? At GitHub we leverage `Rack::BodyProxy` to write Rack Request logs after the request is complete, however Rack span is already finished and its related `Context` has already been detached. This means we are not able to correlate request logs to our traces. The advantage of using `Rack::Events` is that handlers are triggered during difference stages of a request, including for deferred operations like [`Rack::BodyProxy`](https://www.rubydoc.info/gems/rack/Rack/BodyProxy), as opposed to middlewares who _only_ are invoked inline. The disadvantage of this API is that it makes managing the request more difficult and we have to track the Context positions to detach in the `Rack::Env`. This implementation will be released along side the existing instrumtation to give users the option to use the middleware instead of the `Rack::Events` handler until we are able to run this in some of our heavy production workloads. Fixes #341 Related DataDog/dd-trace-rb#1746 * squash: additional feature parity * squash: add allowed response headers * squash: url quantization * squash: Now with new Lemon Scented response headers * squash: we are now at parity * squash: use instrumetation config * squash: Use declarative config options * squash: fix bad refactoring * convert proxy span to an event * refactor: move configurations to instrumentation install * squash: add test converage * squash: make response headers a little more resilient * squash: Ensures event middleware will not cause the application to crash when it encounters errors * squash: fix linter error * feat: Add middleware args helper for ActionPack and Sinatra * fix: test case * fix: Rack Events is autoloaded so if the parent module is present so are submodules * fix: More precise error handling * fix: Ensure config is cleared/setup during installation * fix: Sinatra 1.4 compatability * fix: bad merge * fix: invalid responses in test case * squash: Added a few more test cases

skcc321 requested a review from a team November 1, 2021 11:38

skcc321 changed the title ~~[fix] rack middleware close trace via body proxy~~ [fix] rack middleware close trace via body proxy (Rack::CommonLogger compatibility) Nov 1, 2021

skcc321 force-pushed the rack-middleware-body-proxy branch 4 times, most recently from 5332b8d to 14c7616 Compare November 1, 2021 22:18

marcotc reviewed Nov 1, 2021

View reviewed changes

[fix] rack middleware close trace via body proxy

513c157

skcc321 force-pushed the rack-middleware-body-proxy branch from 14c7616 to 513c157 Compare November 19, 2021 11:56

arielvalentin mentioned this pull request Feb 17, 2023

Rack Span not available in Rack::BodyProxy open-telemetry/opentelemetry-ruby-contrib#341

Closed

arielvalentin mentioned this pull request Feb 18, 2023

feat: Use Rack::Events for instrumentation open-telemetry/opentelemetry-ruby-contrib#342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] rack middleware close trace via body proxy (Rack::CommonLogger compatibility) #1746

[fix] rack middleware close trace via body proxy (Rack::CommonLogger compatibility) #1746

skcc321 commented Nov 1, 2021 •

edited

Loading

marcotc left a comment

delner commented Nov 2, 2021

skcc321 commented Nov 2, 2021

skcc321 commented Nov 2, 2021 •

edited

Loading

ivoanjo commented Nov 2, 2021

delner commented Nov 2, 2021

skcc321 commented Nov 3, 2021 •

edited

Loading

codecov-commenter commented Nov 19, 2021

skcc321 commented Nov 19, 2021

ivoanjo commented Nov 19, 2021

[fix] rack middleware close trace via body proxy (Rack::CommonLogger compatibility) #1746

Are you sure you want to change the base?

[fix] rack middleware close trace via body proxy (Rack::CommonLogger compatibility) #1746

Conversation

skcc321 commented Nov 1, 2021 • edited Loading

What is the issue?

marcotc left a comment

Choose a reason for hiding this comment

delner commented Nov 2, 2021

skcc321 commented Nov 2, 2021

skcc321 commented Nov 2, 2021 • edited Loading

ivoanjo commented Nov 2, 2021

delner commented Nov 2, 2021

skcc321 commented Nov 3, 2021 • edited Loading

codecov-commenter commented Nov 19, 2021

Codecov Report

skcc321 commented Nov 19, 2021

ivoanjo commented Nov 19, 2021

skcc321 commented Nov 1, 2021 •

edited

Loading

skcc321 commented Nov 2, 2021 •

edited

Loading

skcc321 commented Nov 3, 2021 •

edited

Loading