-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for ems workers to finish before destroying the ems #14848
Conversation
@miq-bot add_label wip |
Can a worker be killed mid refresh? if so then this might not solve the BZ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can a orchestrate_destroy
call be cancelled?
I also see some queue items rescheduled via MiqException::MiqQueueRetryLater
How does the UI and rest api currently destroy an ems?
app/models/ext_management_system.rb
Outdated
# Something like a 15 seconds before trying again? | ||
MiqQueue.put( | ||
:deliver_on => 15.seconds.from_now, | ||
:class_name => name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt this be self.class.name
?
app/models/ext_management_system.rb
Outdated
@@ -426,6 +426,24 @@ def enable! | |||
update!(:enabled => true) | |||
end | |||
|
|||
# Wait until all associated workers are dead to destroy this ems | |||
def orchestrate_destroy | |||
disable_ems if enabled? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was renamed to disable!
app/models/ext_management_system.rb
Outdated
disable_ems if enabled? | ||
|
||
workers = MiqWorker.find_current_or_starting.where(:queue_name => "ems_#{id}") #better way to get 'queue_name'? | ||
if workers.count == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe these checks can be put into a rails before_destroy
callback and cancel the destroy via normal rails methods?
Do we want to enable that? i dont think we currently allow to cancel an ems delete |
2575061
to
aa7c523
Compare
not sure, but in this loop case, it can result in an endless loop. So maybe the queue retry logic is better suited for this... |
app/models/ext_management_system.rb
Outdated
def orchestrate_destroy | ||
disable! if enabled? | ||
|
||
workers = MiqWorker.find_current_or_starting.where(:queue_name => "ems_#{id}") #better way to get 'queue_name'? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup PR, can you do this?
manageiq/app/models/mixins/per_ems_worker_mixin.rb
Lines 94 to 101 in d199f44
def queue_name_for_ems(ems) | |
# Host objects do not have dedicated refresh workers so request a generic worker which will | |
# be used to make a web-service call to a SmartProxy to initiate inventory collection. | |
return "generic" if ems.kind_of?(Host) && ems.acts_as_ems? | |
return ems unless ems.kind_of?(ExtManagementSystem) | |
"ems_#{ems.id}" | |
end |
Change:
def queue_name_for_ems(ems)
# Host objects do not have dedicated refresh workers so request a generic worker which will
# be used to make a web-service call to a SmartProxy to initiate inventory collection.
return "generic" if ems.kind_of?(Host) && ems.acts_as_ems?
return ems unless ems.kind_of?(ExtManagementSystem)
"ems_#{ems.id}"
end
To:
def queue_name_for_ems(ems)
# Host objects do not have dedicated refresh workers so request a generic worker which will
# be used to make a web-service call to a SmartProxy to initiate inventory collection.
return "generic" if ems.kind_of?(Host) && ems.acts_as_ems?
return ems unless ems.kind_of?(ExtManagementSystem)
ems.queue_name
end
And in ExtManagementSystem:
def queue_name
"ems_#{ems.id}"
end
Finally... then this line becomes:
workers = MiqWorker.find_current_or_starting.where(:queue_name => ems.queue_name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app/models/ext_management_system.rb
Outdated
:deliver_on => 15.seconds.from_now, | ||
:class_name => self.class.name, | ||
:instance_id => id, | ||
:method_name => "orchestrate_destroy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, per ems workers can be running a single queue item for a long time... we could go through lots of these queue messages before the worker exits and we can destroy the ems.
How often do this happen in practice? Can we just wait 5 minutes between checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do this happen in practice? Can we just wait 5 minutes between checks?
@jrafanie I think users expect for the deletion process to be fairly quick. Can we instead cancel working jobs? can we do that safely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a silly question, since we recently added the ability to disable a provider, can we prevent deletion of enabled providers? The disable could be responsible for making sure any ems workers are also stopped before it's marked at disable. In other words, force them to first disable the provider, then delete it?
Or at least do exponential backoff... |
@cben possibly, i dont know if we can 'remember' the previous wait time between iterations. |
seems correct. ill make the change after we discuss this a while. |
The disable could be responsible for making sure any ems workers are also
stopped before it's marked at disable
Wouldn't that be a race condition, with sync_workers trying to bring them
up?
|
I don't remember how enable flag works but I would assume that sync_workers would not start workers for disable ems'. |
I would
I also assume the |
cc @simon3z |
so what should be in the
so destroy_queue should run
at least the UI does it this way |
@zeari yes to all :)
it calls |
aa7c523
to
a81f3ba
Compare
app/models/ext_management_system.rb
Outdated
disable! if enabled? | ||
|
||
if self.destroy == nil | ||
destroy_queue(15.seconds.from_now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesnt raising MiqQueueRetryLater
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried with raise MiqException::MiqQueueRetryLater.new(:deliver_on => 15.seconds.from_now)
but it would just queue the job immediatly. It did write a fancy message from https://github.com/ManageIQ/manageiq/blob/master/app/models/miq_queue.rb#L328 though.
I cant find the correct way to utilize that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its because im testing with simulate_queue_worker
and that it doesnt take deliver_on
into account
0a95e41
to
589fa35
Compare
app/models/ext_management_system.rb
Outdated
@@ -426,6 +426,32 @@ def enable! | |||
update!(:enabled => true) | |||
end | |||
|
|||
# override destroy_queue from AsyncDeleteMixin | |||
def destroy_queue(deliver_on = Time.now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave :deliver_on
nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great 👍
And indeed it looks like a minimal invasive solution.
One thing I see coming in our way is how to cancel a destroy. In this case we have no way to enable
the ems again, because it's disable
d everytime orchestrate_destroy
is called.
But I think thats ok for now....
app/models/ext_management_system.rb
Outdated
|
||
before_destroy :before_destroy | ||
|
||
def before_destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would opt for a descriptive name, like assert_no_queues_present
or wrap the assertion in a block
before_destroy do |ems|
throw(:abort) if !MiqWorker.find_current_or_starting.where(:queue_name => ems.queue_name).count.zero?
end
app/models/ext_management_system.rb
Outdated
disable! if enabled? | ||
|
||
if self.destroy == false | ||
raise MiqException::MiqQueueRetryLater.new(:deliver_on => 15.seconds.from_now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you skip deliver_on
whats the default?
And is there a throttling in place?
Will the retry
be until it succeeds or does it give up eventually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you skip deliver_on whats the default?
immediately
And is there a throttling in place?
I dont think so...
Will the retry be until it succeeds or does it give up eventually?
I found that we can add an expires_on
setting on it. Maybe cancel after 5 mins?
that wont work since we dont 'carry' that attribute to the next time we queue the destroy.
@miq-bot remove_label wip |
2833708
to
36bd868
Compare
app/models/ext_management_system.rb
Outdated
end | ||
|
||
# override destroy_queue from AsyncDeleteMixin | ||
def destroy_queue(deliver_on = nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method signature is confusing to me. The instance method takes a time, and the class method takes an array of ids. In addition, they're very similar with only the deliver_on being the main difference.
Can this method be called destroy_queue_in_future
, schedule_destroy_queue
or something else...
and have it call the class method directly, and make the class method take an optional deliver_on along with the ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking nice.
Just the one to remove to_miq_a
and a few questions.
LGTM
app/models/ext_management_system.rb
Outdated
@@ -422,6 +422,47 @@ def enable! | |||
update!(:enabled => true) | |||
end | |||
|
|||
def self.destroy_queue(ids) | |||
ids = ids.to_miq_a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't use to_miq_a
, use Array.wrap(ids)
if you need
app/models/ext_management_system.rb
Outdated
def self.destroy_queue(ids) | ||
ids = ids.to_miq_a | ||
_log.info("Queuing destroy of #{name} with the following ids: #{ids.inspect}") | ||
ids.each do |id| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really have to send a different message for every destroy?
Better yet, could we just send in the where clause and not be id
centric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well we would have to manage that where clause since in each iteration some ids(providers) would be deleted and some would have to try again later so im not sure its worth it
app/models/ext_management_system.rb
Outdated
|
||
if self.destroy == false | ||
_log.info("Cant #{self.class.name} with id: #{id}, workers still in progress. Requeuing destroy...") | ||
raise MiqException::MiqQueueRetryLater.new(:deliver_on => 15.seconds.from_now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we rarely use MiqQueueRetryLater
and are going away from that since it is queue system specific. Can we just queue another destroy?
@@ -97,7 +97,7 @@ def queue_name_for_ems(ems) | |||
return "generic" if ems.kind_of?(Host) && ems.acts_as_ems? | |||
|
|||
return ems unless ems.kind_of?(ExtManagementSystem) | |||
"ems_#{ems.id}" | |||
ems.queue_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. good change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its also in #14864
1d767a3
to
b3339f9
Compare
b3339f9
to
f6d04e8
Compare
Checked commits zeari/manageiq@423b115~...f6d04e8 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 app/models/ext_management_system.rb
|
def orchestrate_destroy | ||
disable! if enabled? | ||
|
||
if self.destroy == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove self
to make rubocop happy
awesome addition @zeari 👏 |
🎉 👏 |
so this yields a couple of errors from the previous restructuring. id rather we unmerge, ill fix those and merge again.... |
nvm, fix is here: #14848 |
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1437549
continue work from #14675
@durandom @Fryguy @jrafanie @kbrock
I have no idea whats the correct way to do this so lets discuss that here. The code added here outlines the basic functionality.in https://github.com/manageiq/manageiq-ui-classic/blob/master/app/controllers/ems_common.rb#L898.orchestrate_destroy
should replacedestroy_queue
cc @moolitayer @cben
Update (@blomquisg)
BZs: