Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rack instrumentation #166

Merged
merged 75 commits into from
Mar 5, 2020
Merged

Add Rack instrumentation #166

merged 75 commits into from
Mar 5, 2020

Conversation

duonoid
Copy link
Contributor

@duonoid duonoid commented Jan 14, 2020

Overview

Adds rack instrumentation adapter implementation, ported from existing dd-trace-rb functionality.

rack instrumentation is required by rails instrumentation, which does not exist yet.

Details

  • adds a working example, runnable via docker-compose, e.g.,
docker-compose run --rm ex-adapter-rack
bash-5.0$ ruby trace_demonstration.rb
  • adds functionality to trace 'front-end' request (queue time) and retain middleware names
  • QueueTimeimplementation is directly from dd-trace-rb
  • config options include:retain_middleware_names, :allowed_response_headers, :allowed_request_headers

Related

* rack-test: "simple testing API for Rack apps"
* fix: "Could not find rake-12.3.3 in any of the sources"
e.g.,
$ bundle exec rake test
e.g.,
$ docker-compose run --rm ex-adapter-rack
bash-5.0$ ruby trace_demonstration.rb
* from dd-trace-rb
* translate spec to minitest
* use dup._call env pattern
* RE: thread safety: "... easiest and most efficient way to do this is to
  dup the middleware object that is created in runtime."
* do we need 'quantization' now?
* retain middleware names (feature exists in dd-trace-rb)
adapters/rack/Gemfile Outdated Show resolved Hide resolved
Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass and have left a handful of comments.

end

def allowed_response_header_names
Array(config[:allowed_response_headers])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be memoized? Another option might be to have a more sophisticated config (rather than just a hash). We could consider trying to abstract and borrow prior art from dd-trace here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adding caching of the value in the class's singleton for the time-being.

end

def full_http_request_url
# NOTE: dd-trace-rb has implemented 'quantization'? (which lowers url cardinality)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duonoid and others added 14 commits January 22, 2020 15:38
* verified by running via, e.g.,
  $ docker-compose run --rm ex-adapter-rack
  bash-5.0$ bundle
  bash-5.0$ ruby trace_demonstration.rb

Expected: console output of span data
* to allow for lower cardinality span names
* defeats purpose of opentelemetry, which *is* distributed tracing
* add 'http.target'
* it seems that dd-trace-rb's 'span.resource' is the same as 'span.name',
  at least in this case
* in sinatra, data is available via env['sinatra.route'].split.last,
  but otherwise, doesn't seem to be readily available
* resolve 'http_server.queue' TODO
* span kind of 'proxy' is not defined, yet
* reduce string allocations
* TIL: a nested 'before' block doesn't run before *each* test,
  but the top/first 'before' block does. (weird)
* prevent unneeded string and object allocation
* once a header key is found, add it to the list of response headers
  to search for
* avoid using instance variables
…ion--67

# By Matthew Wear (2) and Grant Bourque (1)
# Via GitHub
* upstream/master:
  Fix rest client reference in top level Rakefile (#189)
  Preliminary error recording (#184)
  Add `excon` instrumentation adapter (#175)

# Conflicts:
#	Rakefile
Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more comments. We have options now that Tracer#in_span records and reraises exceptions. I left a comment showing how we can take advantage of it. There are also quite a few comments in the source that were useful for review, but should probably be removed for the final version. If you don't mind take a look through them and make a judgement call about what should stay, or go.


private

def create_request_span(env, context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the alternative call above, this method won't exactly be necessary.


# return Context with the frontend span as the current span
def create_frontend_span(env, extracted_context)
# NOTE: get_request_start may return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a useful comment, but we might be able to loose the following notes.

# Note: if a middleware catches an Exception without re raising,
# the Exception cannot be recorded here.
def record_and_reraise_error(error, request_span:)
request_span.record_error(error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Span#record_error sets the status for us now. If you use the alternate call (that uses in_span) we won't need this method. If we keep it, it doesn't need to set the status.

Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting there. I found a couple of issues with nil attributes that you'll find in the comments. One other issue I found is that when this is used along with Sinatra we get a NameError because the reference to Rack::Utils needs to be written as ::Rack::Utils since since we have a Rack constant in OpenTelemetry::Adapters now.

The line that needs updating is here: https://github.com/open-telemetry/opentelemetry-ruby/blob/master/adapters/sinatra/lib/opentelemetry/adapters/sinatra/middlewares/tracer_middleware.rb#L46

Example stacktrace:

NameError: uninitialized constant OpenTelemetry::Adapters::Rack::Utils
	/Users/matthewwear/Repos/opentelemetry-ruby/adapters/sinatra/lib/opentelemetry/adapters/sinatra/middlewares/tracer_middleware.rb:46:in `trace_response'
	/Users/matthewwear/Repos/opentelemetry-ruby/adapters/sinatra/lib/opentelemetry/adapters/sinatra/middlewares/tracer_middleware.rb:26:in `block (2 levels) in call'
	/Users/matthewwear/Repos/opentelemetry-ruby/adapters/sinatra/lib/opentelemetry/adapters/sinatra/middlewares/tracer_middleware.rb:26:in `tap'
	/Users/matthewwear/Repos/opentelemetry-ruby/adapters/sinatra/lib/opentelemetry/adapters/sinatra/middlewares/tracer_middleware.rb:26:in `block in call'
	/Users/matthewwear/Repos/opentelemetry-ruby/api/lib/opentelemetry/trace/tracer.rb:51:in `block in in_span'
	/Users/matthewwear/Repos/opentelemetry-ruby/api/lib/opentelemetry/trace/tracer.rb:66:in `block in with_span'
	/Users/matthewwear/Repos/opentelemetry-ruby/api/lib/opentelemetry/context.rb:59:in `with_value'
	/Users/matthewwear/Repos/opentelemetry-ruby/api/lib/opentelemetry/trace/tracer.rb:66:in `with_span'
	/Users/matthewwear/Repos/opentelemetry-ruby/api/lib/opentelemetry/trace/tracer.rb:51:in `in_span'
	/Users/matthewwear/Repos/opentelemetry-ruby/adapters/sinatra/lib/opentelemetry/adapters/sinatra/middlewares/tracer_middleware.rb:18:in `call'
...

duonoid and others added 4 commits February 25, 2020 09:34
* value is potentially nil, which is not a valid attribute value
* also 'service' is not an official semantic convention and will
  probably come from the application resource in the future
Copy link
Member

@mwear mwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duonoid: thank you so much for all your hard work on this!

@mwear
Copy link
Member

mwear commented Feb 27, 2020

Last call for comments. It'd be nice to merge this tomorrow after the SIG meeting.

end
ensure
finish_span(frontend_context)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty convoluted and incorrect. If I understand correctly, with the "frontend" span enabled, we'll end up with 2 spans nested like:

name: http_server.queue
start_timestamp: T
end_timestamp: T+D
kind: server
attributes:
  - start_time: request_start_time
children:
  - name: $REQUEST_URI
    start_timestamp: T-ish
    end_timestamp: T+D-ish
    kind: server
    ...

There are several problems here:

  1. The duration of the 2 spans is about the same, so the queue time is only represented in an attribute of the outer span. It is more useful to create the http_server.queue span with start_timestamp: request_start_time, which will more accurately represent the LB request duration as the span's duration.
  2. Both spans are marked as kind: server, which is logically incorrect and would be reported as so by some vendors' trace quality reports. This is supposed to represent time between a proxy/LB receiving a request and the service handling it, there has been chatter about representing this with something like kind: proxy (Span.Kind.LOAD_BALANCER (PROXY/SIDECAR) opentelemetry-specification#51 and Clarify the interpretation of SpanKind opentelemetry-specification#337).
  3. For analysis backends deriving SLIs from trace data, it is possible they may count requests twice if both spans are kind: server.
  4. Calling the frontend span http_server.queue is inaccurate - it actually represents (almost) the entire request processing time from the perspective of the LB/proxy.

Alternatives:

  1. Rename the frontend span to http_server.proxy or similar. Change the kind of the request span to internal if the frontend span is present.
  2. Use a single span with start_timestamp: request_start_time, kind: :server and add a timed event for the service start time (i.e. "now").
  3. Create both spans as children of the extracted_context, with the frontend span having start_timestamp: request_start_time, end_timestamp: now and the request span having start_timestamp: now. This allows the frontend span to represent queuing time only.
  4. Create 2 spans for the frontend: a kind: :server and a kind: :client with the same start time and duration but a different resource. Specifying a different resource is not supported by the API presently.

Copy link
Member

@mwear mwear Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be for option 1 or 2. The original datadog instrumentation did have span.kind = proxy. internal would work for now and we can change it to proxy when and if it gets added.

As a side issue, I think we are going to have problems with nested server spans. Currently Sinatra will set its span kind as server. Using Rack with Sinatra will result in nesting of server spans. We will need to come up with a way to resolve this issue as a user could use Sinatra instrumentation without Rack instrumentation, and that would be what you want.

The same is true for client spans with Faraday + net http.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated, using Option 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this. @fbogsany WDYT?

duonoid and others added 3 commits February 28, 2020 10:25
* request_span.kind is :internal if frontend_span is present
* future: change request_span :kind to ':proxy' if/when it gets added to spec
…ion--67

# Conflicts:
#	.circleci/config.yml
#	Rakefile
#	docker-compose.yml
@duonoid
Copy link
Contributor Author

duonoid commented Feb 28, 2020

I've resolved the merge conflicts.

@mwear
Copy link
Member

mwear commented Feb 28, 2020

The conflicts are back.

…ion--67

# Conflicts:
#	.circleci/config.yml
#	Rakefile
#	docker-compose.yml
@duonoid
Copy link
Contributor Author

duonoid commented Feb 28, 2020

conflicts: resolved. 😄

@fbogsany
Copy link
Contributor

fbogsany commented Mar 5, 2020

I approved. I'll still extract my comment to an issue, since I think the spec will have to address this at some point, and we may need to revisit. This is 👍 for now tho.

@fbogsany fbogsany merged commit c53e44d into open-telemetry:master Mar 5, 2020
arielvalentin added a commit to arielvalentin/opentelemetry-ruby-contrib that referenced this pull request Feb 28, 2023
This configuration option sets a rack environment variable that isn't used in the instrumentation or its dependencies.

dd-trace-rb used this option to set the span name to the Rack application Name:

https://github.com/DataDog/dd-trace-rb/blob/d76de947d66fe50f5ac6c7a6016ebbfa5c0de986/lib/datadog/tracing/contrib/rack/middlewares.rb#L158

It looks like some of the functionality was ported over but applying it to the span name was never completed:

open-telemetry/opentelemetry-ruby#166
arielvalentin added a commit to open-telemetry/opentelemetry-ruby-contrib that referenced this pull request Mar 2, 2023
This configuration option sets a rack environment variable that isn't used in the instrumentation or its dependencies.

dd-trace-rb used this option to set the span name to the Rack application Name:

https://github.com/DataDog/dd-trace-rb/blob/d76de947d66fe50f5ac6c7a6016ebbfa5c0de986/lib/datadog/tracing/contrib/rack/middlewares.rb#L158

It looks like some of the functionality was ported over but applying it to the span name was never completed:

open-telemetry/opentelemetry-ruby#166
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants