-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kill workers that don't stop after a configurable time #13805
Conversation
@@ -99,6 +99,12 @@ def check_not_responding(class_name = nil) | |||
processed_workers.collect(&:id) | |||
end | |||
|
|||
NOT_RESPONDING = :not_responding | |||
MEMORY_EXCEEDED = :memory_exceeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this information belongs somewhere else. Maybe MiqServer::WorkerManagement::Monitor::Reason
?
Then ideally callers of worker_set_monitor_reason
will also use the same constants. Not sure if making that kind of change is in this PR's scope though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good
Will merge after @jrafanie makes a couple of small changes. |
Previously, we'd gracefully ask them to exit and if the queue work they're doing, takes 1 hour to do, they'd exceed memory thresholds, keep running until the work is done and finally respond to the exit request. Now, we mark them as 'stopping' when they exceed a threshold and they have up to 10 minutes to finish before we'd kill them. This value is configurable in the 'stopping_timeout' field in each worker's advanced settings. https://bugzilla.redhat.com/show_bug.cgi?id=1395736
6d22a3c
to
b60a5f0
Compare
Checked commits jrafanie/manageiq@e5f4bd3~...b60a5f0 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 spec/models/miq_server/worker_management/monitor_spec.rb
|
Ok, I think I got your good suggestion in. Take another look @carbonin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
cc @jcarter12 (this is the stopping workers PR) |
Due to module inclusion spaghetti, it's easier and less confusing to reference the Reason constants consistently in the MiqServer class, which is the ultimate destination for all of these modules. Fixes ManageIQ#13901 introduced in ManageIQ#13805 https://bugzilla.redhat.com/show_bug.cgi?id=1395736
Kill workers that don't stop after a configurable time (cherry picked from commit 9764870) https://bugzilla.redhat.com/show_bug.cgi?id=1395736
Kill workers that don't stop after a configurable time (cherry picked from commit 9764870) https://bugzilla.redhat.com/show_bug.cgi?id=1395736
Backported to Euwe via #13949 |
Backported to Darga via #13950 |
Previously, we'd gracefully ask them to exit and if the queue work
they're doing, takes 1 hour to do, they'd exceed
memory thresholds, keep running until the work is done and finally
respond to the exit request.
Now, we mark them as 'stopping' when they
exceed a threshold and they have up to 10 minutes to finish before we'd
kill them. This value is configurable in the 'stopping_timeout' field in
each worker's advanced settings.
https://bugzilla.redhat.com/show_bug.cgi?id=1395736
@gtanzillo @carbonin Please review