-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always use file based heartbeat #19666
Always use file based heartbeat #19666
Conversation
#19665 is merged |
My only concern with the heartbeat file is that since we've only used it in a container so far, it had no chance for file collisions, since it's only one worker per container. So, I'm not sure if lines like this matter now - manageiq/lib/workers/miq_defaults.rb Line 22 in 2fb2389
|
ea28379
to
928af24
Compare
928af24
to
efb4593
Compare
This separates getting messages from the server over drb from the actual heartbeat which happens to the file.
The config change timestamp is stored in memcached and is checked after each worker heartbeat. Sending this message over drb isn't necessary anymore
…ectly The indirection here was put in place to match the existing drb message processing as closely as possible, now that that has been removed this can be simplified.
Additionally remove spec for #process_message which was really a spec for #sync_config
Waiting on ManageIQ/vmware_web_service#70 to be released so that the broker doesn't fail with:
|
This was a problem for the EventCatcher when sync_config was called as a part of worker initialization. The EventCatcher uses its ems hostname as a part of its log prefix, but the ems isn't set up until after the first call to sync_config. This was leading to errors when the worker was starting: [----] I, [2019-12-19T17:51:48.407958 ManageIQ#11054:2ab1a6d9a5f8] INFO -- : Starting ManageIQ::Providers::Vmware::InfraManager::EventCatcher with runner options {:ems_id=>"2", :guid=>"43a315ba-39ff-4326-99a6-89b3af48b1ee"} [----] I, [2019-12-19T17:51:48.451836 ManageIQ#11054:2ab1a6d9a5f8] INFO -- : Deleting worker record for ManageIQ::Providers::Vmware::InfraManager::EventCatcher, id 310 /home/ncarboni/Source/manageiq/app/models/manageiq/providers/base_manager/event_catcher/runner.rb:63:in `log_prefix': undefined method `hostname' for nil:NilClass (NoMethodError) from /home/ncarboni/Source/manageiq/app/models/miq_worker/runner.rb:242:in `sync_config' from /home/ncarboni/Source/manageiq/app/models/miq_worker/runner.rb:52:in `worker_initialization' from /home/ncarboni/Source/manageiq/app/models/miq_worker/runner.rb:42:in `initialize' from /home/ncarboni/Source/manageiq/lib/workers/bin/run_single_worker.rb:113:in `new' from /home/ncarboni/Source/manageiq/lib/workers/bin/run_single_worker.rb:113:in `<main>'
17076f9
to
010055a
Compare
@Fryguy this should be ready for real review. |
Also, this is related to the pods and zones epic ManageIQ/manageiq-pods#353 |
@@ -32,8 +32,6 @@ def monitor_workers | |||
persist_last_heartbeat(worker) | |||
# Check the worker record for heartbeat timeouts | |||
next unless validate_worker(worker) | |||
# Tell the valid workers to sync config if needed | |||
worker_set_message(worker, "sync_config") if resync_needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮 Awesome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though, how do the worker know when config is changed now?
|
||
if config_out_of_date? | ||
_log.info("#{log_prefix} Synchronizing configuration...") | ||
sync_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, there it is. Ignore my above question.
worker_monitor_drb.worker_get_messages(@worker.pid).each do |msg, *args| | ||
process_message(msg, *args) | ||
end | ||
rescue DRb::DRbError => err | ||
do_exit("Error heartbeating to MiqServer because #{err.class.name}: #{err.message}", 1) | ||
do_exit("Error processing messages from MiqServer because #{err.class.name}: #{err.message}", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this PR is about removing DRb-based heartbeating, why do we have rescue DRb::DRbError
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method in particular is not for heartbeat. This fetches messages set for the worker in the @workers
hash over drb.
The vim broker is the only worker that still uses this functionality (previously it was also used for config sync and stopping workers), so this can all be removed once the vim broker is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, processing messages and heartbeating are two different operations that we happen to execute at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the clarification @carbonin
…ymore The side-effects of this method are still needed
Checked commits carbonin/manageiq@849eec9~...fe52aad with ruby 2.5.5, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0 |
Based on #19665