[POC][WIP] ActiveSupport::Notifications approach to determining slow requests #17958

NickLaMuro · 2018-09-07T21:37:42Z

This is a different approach to #17842

Description

This peers into the private APIs of ActiveSupport and ActionController::LogSubscriber to check the status of the queued up events for 'process_action.action_controller', and see if any are larger than 10 seconds (will probably update this to 1 minute). If they are, it will report them to the Rails.logger.

Pros

Zero object allocations and stacklevel weight added to requests
Almost zero object allocations in the watching thread

Cons

Very "Rails internal API" focused (aka "confusing as F...")
Touching internal thread variables that are used elsewhere (more unknowns)

Usage

This can be instrumented in many ways, but the easiest is via a config/initializer for local development use:

 # config/initializers/request_watcher.local.rb
if ENV['WATCH_FOR_LONG_REQUESTS']
 require 'workers/rails_request_monitor'

  Thread.new do
   loop do
     sleep 10
     RailsRequestMonitor.log_long_running_requests
   end
 end.abort_on_exception = true
end

And running a development server by running:

$ WATCH_FOR_LONG_REQUESTS=1 bin/rails s

This will check every 10 seconds, and when long requests are found, they will be output to the Rails.log on level WARN.

This can also be instrumented in MiqWebServerRunnerMixin with relative ease:

 # app/models/mixins/miq_web_server_runner_mixin.rb

+
+ def do_heartbeat_work
+   RailsRequestMonitor.log_long_running_requests
+ end

Or anywhere that it makes sense (signal handlers, etc.)

Links

Detect and log long running http(s) requests #17842

Steps for Testing/QA

The steps in the BZ from #17842 is a good start, but this is what I did:

Get the code from this branch: git apply <(curl -L https://github.com/ManageIQ/manageiq/pull/17958.path)
Update the API (bundle open manageiq-api) to include this code in app/controllers/api/ping_controller.rb:

module Api
  class PingController < ActionController::API
    def index
      sleep 60 if rand(5) == 0
      render :plain => 'pong'
    end
  end
end

Add the initializer from above to config/initializers/.
Run a rails server: $ WATCH_FOR_LONG_REQUESTS=1 bin/rails s
Hit localhost:3000/api/ping until you get a few requests that take a while a bunch of times and watch the logs

continuing to refresh regardless if you get a response is probably the best technique here

You should see [WARN] messages eventually for the requests that don't complete immediately.

This peers into the private APIs of `ActiveSupport` and `ActionController::LogSubscriber` to check the status of the queued up events for 'process_action.action_controller', and see if any are larger than 10 seconds. If they are, it will report them to the Rails.logger. Usage ----- This can be instrumented in many ways, but the easiest is via config/initializer: # config/initializers/request_watcher.local.rb if ENV['WATCH_FOR_LONG_REQUESTS'] require 'workers/rails_request_monitor' Thread.new do loop do sleep 10 RailsRequestMonitor.log_long_running_requests end end.abort_on_exception = true end And running a development server by running: $ WATCH_FOR_LONG_REQUESTS=1 bin/rails s This will check every 10 seconds, and when long requests are found, they will be output to the Rails.log on level WARN. This can also be instrumented in MiqWebServerRunnerMixin with relative ease: # app/models/mixins/miq_web_server_runner_mixin.rb + + def do_heartbeat_work + RailsRequestMonitor.log_long_running_requests + end Or anywhere that it makes sense (signal handlers, etc.) TODO ---- - Update to 1.minute instead of 10.seconds (for quicker testing) - Tests - Instrument in `MiqWebServerRunnerMixin` (new commit?)

- Defers `now` and `too_slow` instantiations to happen after the first loop happens - Uses integers for seconds instead of `ActiveSupport`'s `.seconds` (this actually ends up creating a ton of objects doing this - Memoizes the seconds taken in a constant. This leaves a few object allocations for each time `.log_long_running_requests` is called without any long requests currently active.

NickLaMuro · 2018-09-07T21:38:45Z

To @jrafanie 's comment in my commit (the monster)... he can tell you that the answer is a bit of a 📖...

miq-bot · 2018-09-07T21:51:52Z

Checked commits NickLaMuro/manageiq@fb7f4bd~...c6c55d1 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 9 offenses detected

lib/workers/rails_request_monitor.rb

⚠️ - Line 10, Col 20 - Lint/BlockAlignment - } at 10, 19 is not aligned with QUEUE_KEY = ActiveSupport::Subscriber.subscribers.detect { |sub| at 8, 2.
⚠️ - Line 15, Col 44 - Lint/AssignmentInCondition - Use == if you meant to do a comparison or wrap the expression in parentheses to indicate you meant to assign in a condition.
⚠️ - Line 4, Col 19 - Lint/Void - Variable LogSubscriber used in void context.
❗ - Line 15, Col 9 - Style/Next - Use next to skip iteration.
❗ - Line 19, Col 31 - Rails/TimeZone - Do not use Time.now without zone. Use one of Time.zone.now, Time.current, Time.now.in_time_zone, Time.now.utc, Time.now.getlocal, Time.now.iso8601, Time.now.jisx0301, Time.now.rfc3339, Time.now.to_i, Time.now.to_f instead.
❗ - Line 23, Col 49 - Layout/ExtraSpacing - Unnecessary spacing detected.
❗ - Line 8, Col 65 - Style/BlockDelimiters - Avoid using {...} for multi-line blocks.
❗ - Line 8, Col 72 - Layout/TrailingWhitespace - Trailing whitespace detected.
❗ - Line 9, Col 22 - Style/MethodCallWithArgsParentheses - Use parentheses for method calls with arguments.

NickLaMuro · 2018-09-27T23:31:27Z

@jrafanie going to close this for now since it seems like you went ahead with #17842

jrafanie · 2018-09-28T13:03:15Z

I prefer the simplicity of the initializer in this PR. I just wish some of these API were public. I don't feel confident in that these will likely change underneath us minor or major releases of rails. Perhaps, this can be resurrected if we know APIs we can use that are stable and easy to test against for each rails version. For now though, it feels like the thread watcher thing, #17842, is relying on more stable things.

Honestly, though... this PR, if we can write tests against various rails versions, would be easier to hotfix, so the jury isn't out on this. What's your opinion on this @NickLaMuro ?

NickLaMuro · 2018-09-28T21:42:45Z

@jrafanie sorry, missed this message with the morning flood of notifications to I sifted through:

Honestly, though... this PR, if we can write tests against various rails versions, would be easier to hotfix, so the jury isn't out on this. What's your opinion on this @NickLaMuro ?

I would almost say maybe it makes sense to make this a Rails feature if we go that route, and either:

a) update some of these methods to have public interfaces
b) include this interface in Rails core so that this can be accessed and tested with Rails releases.

Again, I think what we both do like about this approach is we aren't adding anything on to what is already there, and just including a light monitor to watch that already existing data. What really sucks about it though is we are having to dive deep into some internal API's to get this data, and that isn't good long term.

I will maybe look into seeing what implementing something in Rails core would look like when I have some spare cycles.

jrafanie · 2018-09-28T21:50:26Z

Yeah, that summarizes my feelings too. My middleware approach duplicates data that's already there but it controls the CRUD of this data so it's safer to use. If rails provides mechanism to introspect what's already there, we'd get the safety of the other PR without the extra overhead/duplication.

NickLaMuro added 2 commits September 6, 2018 21:53

miq-bot changed the title ~~[POC][WIP] ActiveSupport::Notifications approach to determining slow requests~~ [WIP] [POC][WIP] ActiveSupport::Notifications approach to determining slow requests Sep 7, 2018

miq-bot added the wip label Sep 7, 2018

NickLaMuro changed the title ~~[WIP] [POC][WIP] ActiveSupport::Notifications approach to determining slow requests~~ [POC][WIP] ActiveSupport::Notifications approach to determining slow requests Sep 10, 2018

NickLaMuro closed this Sep 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC][WIP] ActiveSupport::Notifications approach to determining slow requests #17958

[POC][WIP] ActiveSupport::Notifications approach to determining slow requests #17958

NickLaMuro commented Sep 7, 2018

NickLaMuro commented Sep 7, 2018

miq-bot commented Sep 7, 2018

NickLaMuro commented Sep 27, 2018

jrafanie commented Sep 28, 2018

NickLaMuro commented Sep 28, 2018

jrafanie commented Sep 28, 2018

[POC][WIP] ActiveSupport::Notifications approach to determining slow requests #17958

[POC][WIP] ActiveSupport::Notifications approach to determining slow requests #17958

Conversation

NickLaMuro commented Sep 7, 2018

Description

Usage

Links

Steps for Testing/QA

NickLaMuro commented Sep 7, 2018

miq-bot commented Sep 7, 2018

NickLaMuro commented Sep 27, 2018

jrafanie commented Sep 28, 2018

NickLaMuro commented Sep 28, 2018

jrafanie commented Sep 28, 2018