-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for timed out active tasks #15231
Check for timed out active tasks #15231
Conversation
e6e182e
to
3970a3b
Compare
def schedule_check_for_task_timeout | ||
every = worker_settings[:task_timeout_check_frequency] | ||
scheduler = scheduler_for(:scheduler) | ||
scheduler.schedule_every(every, :first_at => Time.current + 1.minute) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yrudman have you tested that rufus supports ActiveSupport::TimeWithZone?
irb(main):003:0> Time.current.class
=> ActiveSupport::TimeWithZone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've done manual testing: leaving active records in DB and restarting server - works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>> require 'rufus-scheduler'
>> s = Rufus::Scheduler.new
>> n = Time.current
=> Fri, 26 May 2017 15:50:17 UTC +00:00
>> p [ :scheduled_at, n, n.to_f ]
[:scheduled_at, Fri, 26 May 2017 15:50:17 UTC +00:00, 1495813817.563702]
=> [:scheduled_at, Fri, 26 May 2017 15:50:17 UTC +00:00, 1495813817.563702]
app/models/miq_task.rb
Outdated
scope :timed_out, -> { where("updated_on < ?", Time.now.utc - ::Settings.task.active_task_timeout.to_i_with_method) } | ||
|
||
def self.update_status_for_timed_out_active_tasks | ||
MiqTask.active.timed_out.no_associated_job.find_each do |task| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about tasks that have a job but are "timed out"? What cleans them up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Job should have its own implementation for checking stalled job due to worker killing. There is 'timing out' implementation for job
but i think it would not accommodate killing worker.
app/models/miq_task.rb
Outdated
@@ -33,6 +33,17 @@ class MiqTask < ApplicationRecord | |||
t.grouping(Arel::Nodes::Case.new(t[:state]).when(STATE_FINISHED).then(t[:status]).else(t[:state])) | |||
end) | |||
|
|||
scope :active, -> { where(:state => STATE_ACTIVE) } | |||
scope :no_associated_job, -> { where.not("id IN (SELECT miq_task_id from jobs)") } | |||
scope :timed_out, -> { where("updated_on < ?", Time.now.utc - ::Settings.task.active_task_timeout.to_i_with_method) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, can you align the ->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
app/models/miq_task.rb
Outdated
|
||
def self.update_status_for_timed_out_active_tasks | ||
MiqTask.active.timed_out.no_associated_job.find_each do |task| | ||
task.update_status(STATE_FINISHED, STATUS_ERROR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a reason to not use STATUS_TIMEOUT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeout
not present as possible state and task
with status Timeout
will not be shown on Task management screens:
@gtanzillo what do you think, should we first allow timed out tasks to be visible on task management screens and change this PR to set status to STATUS_TIMEOUT
?
or proceed with with this PR setting timed out task Error
status ?
or set status to Warn
?
@@ -1299,6 +1301,7 @@ | |||
:session_timeout_interval: 30.seconds | |||
:storage_file_collection_interval: 1.days | |||
:storage_file_collection_time_utc: 21600 | |||
:task_timeout_check_frequency: 1.hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what's are the correct numbers here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
me too ❓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.hour
here seems reasonable. We don't want to be checking too often but, we also don't want to have tasks sitting in an active state for an extended period of time when they are already dead.
@@ -49,4 +49,15 @@ | |||
) | |||
end | |||
end | |||
|
|||
describe "#check_for_timed_out_active_tasks" do | |||
it "queue request to updtae status for timed out tasks" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: update
.... Maybe enqueues update_status_for_timed_out_active_tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for noticing it !
spec/models/miq_task_spec.rb
Outdated
end | ||
|
||
context "task is not active" do | ||
it "does not updates status to 'Error' if task state is 'Finished'" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here - "updates" should be "update"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
spec/models/miq_task_spec.rb
Outdated
job.miq_task | ||
end | ||
|
||
it "does not updates status to 'Error'" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same typo here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
config/settings.yml
Outdated
@@ -1129,6 +1129,8 @@ | |||
:max_parallel_scans_per_host: 1 | |||
:max_qitems_per_scan_request: 0 | |||
:watchdog_interval: 1.minute | |||
:task: | |||
:active_task_timeout: 5.hours |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6.hours
somehow feels better here. I can't explain why, though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -1299,6 +1301,7 @@ | |||
:session_timeout_interval: 30.seconds | |||
:storage_file_collection_interval: 1.days | |||
:storage_file_collection_time_utc: 21600 | |||
:task_timeout_check_frequency: 1.hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.hour
here seems reasonable. We don't want to be checking too often but, we also don't want to have tasks sitting in an active state for an extended period of time when they are already dead.
@miq-bot assign @gtanzillo |
|
||
def self.update_status_for_timed_out_active_tasks | ||
MiqTask.active.timed_out.no_associated_job.find_each do |task| | ||
task.update_status(STATE_FINISHED, STATUS_ERROR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused by the STATUS_ERROR. We should fix this in a followup so the UI displays timed out tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i will look at it
…task_timeout $ runner_spec
ae224d7
to
8051e5e
Compare
Checked commits yrudman/manageiq@032d422~...8051e5e with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 |
@gtanzillo what versions should this go back to? |
@miq-bot add-label fine/yes, euwe/yes |
@yrudman Please remove |
Backported to Euwe via #15277 |
…eout Check for timed out active tasks (cherry picked from commit 7fce59f) https://bugzilla.redhat.com/show_bug.cgi?id=1460349
Fine backport details:
|
Issue:
If reporting worker killed (
kill -9 <pid>
) when report is still running than task status staysRunning
forever.original BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1397600
This PR:
active_task_timeout
andtask_timeout_check_frequency
tosettings.yml
MiqScheduleWorker
@miq-bot add-label bug, core
\cc @jrafanie @gtanzillo