Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only register queue workers using drb for dequeue #19829

Merged
merged 2 commits into from
Feb 17, 2020

Conversation

carbonin
Copy link
Member

Workers should only actually attempt to add themselves to the @workers hash if they are queue workers, if they are using drb to fetch queue messages, and the drb server is available. In the past, this process was tied into heartbeating over drb and was thus skipped when using files for heartbeating. After the switch was made to heartbeat to files exclusively, this registration started running even when using the run_single_worker script from the command line.

This lead to issue #19793. This PR fixes the issue by ensuring that workers are only registered when they are going to dequeue messages over drb and only allowing workers to dequeue over drb when the server is actually available.

Additionally because heartbeating is now completely separate from worker registration I moved the #register_worker method out of the heartbeat concern and into the dequeue one.

Now that heartbeating is done through a file uncondidionally
this method to add a worker to the drb workers list is only
applicable to dequeue
@drb_dequeue_available ||=
begin
server.drb_uri.present? && worker_monitor_drb.respond_to?(:register_worker)
rescue DRb::DRbError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there something else in this block before? How would DRb::DRbError be raised? We're not doing anything with drb, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see, we used to call worker_monitor_drb.register_worker which could raise that exception...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so in this case I just need some call to the drb object to raise to tell if it's available or not. I used a respond_to? here because that proxies over to the server side without actually doing anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I don't think the begin/rescue is needed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it's needed in the method above that calls register_worker on the drb object)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very much needed I think. respond_to? is proxied to the remote drb object so it will raise rather than return false if the drb server is inaccessible.

I would love a way to tell if the drb server is up that didn't rely on exception handling, do you know of something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agrare do you know if there's anything like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I forgot the drb object was proxied and could raise here. Oh DRb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of, in miq_fault_tolerant_vim which wraps MiqVim to deal with DRb "issues" we just catch DRb::DRbConnError and retry if the broker is on a new port

@@ -288,8 +288,6 @@ def heartbeat
_log.info("#{log_prefix} Synchronizing configuration complete...")
end

register_worker_with_worker_monitor unless MiqEnvironment::Command.is_podified?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay, that is_podified? removal must feel good...

Copy link
Member

@jrafanie jrafanie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the comment about the unneeded exception handling, it looks really good.

end
end
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👾 🔫

…able

The worker monitor only needs drb for queue message prefetch now
so only queue workers should be registered.

Additionally, make run_single_worker work from the command line
(without a server process) by ensuring that we can connect to the
DRb server before trying to communicate with it.

Fixes ManageIQ#19793
@carbonin carbonin force-pushed the fix_run_single_worker branch from 32bbc51 to 3928c43 Compare February 17, 2020 15:30
@miq-bot
Copy link
Member

miq-bot commented Feb 17, 2020

Checked commits carbonin/manageiq@ef2d304~...3928c43 with ruby 2.5.7, rubocop 0.69.0, haml-lint 0.20.0, and yamllint
5 files checked, 2 offenses detected

app/models/miq_server/worker_management/dequeue.rb

@jrafanie jrafanie merged commit d8d9555 into ManageIQ:master Feb 17, 2020
@jrafanie jrafanie added this to the Sprint 130 Ending Feb 17, 2020 milestone Feb 17, 2020
@carbonin carbonin deleted the fix_run_single_worker branch April 23, 2020 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants