Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

realtime_block: log timings on exception in outermost block #255

Merged
merged 2 commits into from
Oct 25, 2017

Conversation

cben
Copy link
Contributor

@cben cben commented Aug 7, 2017

When realtime_block is aborted with exception, the timings are still there in the hash, so a higher level may log them.
Unless that is this was the outermost block, in which case they're lost.
This PR logs them in such case.

Example:

[----] E, [2017-08-07T15:04:08.145415 #14296:2aaca3b530cc] ERROR -- : MIQ(ManageIQ::Providers::Openshift::ContainerManager::Refresher#refresh) EMS: [scale-ocp-36-c03], id: [3] Refresh failed
[----] E, [2017-08-07T15:04:08.145736 #14296:2aaca3b530cc] ERROR -- : [KubeException]: Timed out connecting to server  Method:[block in method_missing]
[----] E, [2017-08-07T15:04:08.145855 #14296:2aaca3b530cc] ERROR -- : /home/bpaskinc/myenv/rbenv/versions/2.3.4/lib/ruby/gems/2.3.0/gems/kubeclient-2.4.0/lib/kubeclient/common.rb:117:in `rescue in handle_exception'
...
bin/rails:4:in `<main>'
[----] E, [2017-08-07T15:04:08.145951 #14296:2aaca3b530cc] ERROR -- : MIQ(ManageIQ::Providers::Openshift::ContainerManager::Refresher#refresh) EMS: [scale-ocp-36-c03], id: [3] Unable to perform refresh for the following targets:
[----] E, [2017-08-07T15:04:08.146105 #14296:2aaca3b530cc] ERROR -- : MIQ(ManageIQ::Providers::Openshift::ContainerManager::Refresher#refresh)  --- ManageIQ::Providers::Openshift::ContainerManager [scale-ocp-36-c03] id [3]
[----] I, [2017-08-07T15:04:08.236129 #14296:2aaca3b530cc]  INFO -- : MIQ(ManageIQ::Providers::Openshift::ContainerManager::Refresher#refresh) Refreshing all targets...Complete
[----] I, [2017-08-07T15:04:08.236201 #14296:2aaca3b530cc]  INFO -- : Exception in realtime_block :ems_total_refresh - Timings: {:collect_inventory_for_targets=>60.113999128341675, :ems_refresh=>60.11402702331543, :ems_total_refresh=>60.208136320114136}

@cben
Copy link
Contributor Author

cben commented Aug 8, 2017

@moolitayer asked if this needs an issue / BZ.
Don't think so, "good logging" is not a well defined feature we promise users :-)
It's when other issues happen and we debug from logs that this might help...

@cben
Copy link
Contributor Author

cben commented Aug 8, 2017

@jrafanie @Fryguy @agrare please review.

@moolitayer
Copy link

@cben sorry my comment was not clear, I don't think we need a general logging issue. (see ManageIQ/manageiq-providers-kubernetes#82 (comment))

@@ -29,6 +29,10 @@ def self.realtime_block(key, &block)
begin
ret = realtime_store(hash, key, &block)
return ret, hash
rescue
# Don't let timings be lost on exception when there is nobody else to pick them up.
$log.info("Exception in realtime_block #{key.inspect} - Timings: #{hash.inspect}") if outermost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no other log lines in this file. It probably shouldn't expect $log to be defined and instead be initialized with a logger... But, for a surgical change, maybe add if $log && outermost

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Benchmark extension is meant to be non-miq-specific. In fact, it's on the list to be moved to more_core_extensions. Imagine it's in more_core, how would this change work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm clueless about how all the $log, _log thingies come to be (storks?), please advise...

The OOP answer is not use Benchmark class methods but an object intialized with a logger. meh — much hassle, and pushes similar problem onto the users that would want a global instance...

  • realtime_block could take a logger option, but again too much hassle to pass it everywhere, especially for an edge case.
  • could have some accessor Benchmark.realtime_block_logger = ... to be initialized once in apps that have logging. (need advice where to set in MIQ?)
  • just default to logging to stderr, might be good enough for a rare edge case?

I also wanted to propose a convenience realtime_log_once function that logs timing IFF it's the outermost, and only returns the result, so callers don't need to discard result, _ = ....
Same considerations, except it's standard flow, not an edge case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cben I agree.

  • Creating an instance that you pass a logger seems like a hassle.

  • Using a class accessor on Benchmark to set a logger might work but then the caller knows about benchmark... I guess. It might work? See here where we create the loggers and setup the external loggers.

  • stderr should work too. Try that first and see if the workers log it. I think that's the simplest approach. We should be keeping stderr.

app/models/miq_worker.rb:    pid = Kernel.spawn(self.class.build_command_line(guid), [:out, :err] => [Rails.root.join("log", "evm.log"), "a"])
lib/tasks/evm_application.rb:    pid = Kernel.spawn(env_options, command_line, :pgroup => true, [:out, :err] => [Rails.root.join("log/evm.log"), "a"])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other thing I can think of is a top-level accessor that has the "last" value's run... Something like a last_realtime_block_result. Downside is you would have this thing hanging around forever, but it isn't very big.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try that first and see if the workers log it.

Again, you have to put on your non-manageiq hat, because that's where the class is going. This makes no sense if some random project uses it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes perfect sense for a library to write to stderr on errors if they don't provide a facility for specifying a logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, hardcoded Logger.new($stderr), PTAL. If anyone wants more configuration, do a followup PR :)

Added tests, which made me fix rescue to catch any Exception.
Sneaked in a timecop version bump, to be able to do fractional travel for a simpler-matching regexp. Being a dev gemspec dependency, I believe it's safe to bump without affecting other repos (eg. my manageiq dir with override_gem to this gems-pending is still getting timecop 0.7.3, no conflict).

@jrafanie
Copy link
Member

jrafanie commented Aug 8, 2017

Tangentially related: #139

@miq-bot
Copy link
Member

miq-bot commented Sep 11, 2017

This pull request is not mergeable. Please rebase and repush.

Specifically want to pick up Timecop.travel(float) fix
travisjeffery/timecop#178
Not necessary in inner blocks as timing still gets stored in hash
and if caught will be returned from some outer block.
When outermost block crashes, the whole hash is about to be discarded.
@miq-bot
Copy link
Member

miq-bot commented Sep 27, 2017

Checked commits cben/manageiq-gems-pending@a299923~...b207c1f with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
2 files checked, 2 offenses detected

lib/gems/pending/util/extensions/miq-benchmark.rb

spec/util/extensions/miq-benchmark_spec.rb

@chessbyte
Copy link
Member

@jrafanie @Fryguy @agrare please review.

Copy link
Member

@jrafanie jrafanie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I don't think the timecop upgrade should cause problems with manageiq, but we'll find out soon enough.

@jrafanie jrafanie merged commit 0df3823 into ManageIQ:master Oct 25, 2017
@jrafanie jrafanie added this to the Sprint 72 Ending Oct 30, 2017 milestone Oct 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants