-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If we can't update_attributes on a queue row, set state to error #14365
Conversation
@@ -35,7 +35,7 @@ def thresholds_exceeded? | |||
def get_message_via_drb | |||
loop do | |||
begin | |||
msg_id, lock_version = @worker_monitor_drb.get_queue_message(@worker.pid) | |||
msg_id, lock_version = worker_monitor_drb.get_queue_message(@worker.pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a attr_reader
on the @worker_monitor_drb
so i'm using it here so it's easier to test.
https://bugzilla.redhat.com/show_bug.cgi?id=1429747 In the reported bug, we had a Rails 4.2 era class [1] serialized in the args column of a miq_queue row. This class was removed in rails 5.0.0 [2], so we'd be unable to deserialize the column with: ArgumentError: undefined class/module ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Integer If we're unable to update_attributes because a column can't be deserialized, the message can't be handled by a worker, the worker dies, and the message remains in the miq_queue for another worker to try and also fail on. Instead, if update_attributes fails, we can try to set just the state column to 'error'. In this way, the server will not try to dispatch the same queue multiple times. We clear errored messages at server boot, so we can clean them up then. [1] ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Integer [2] rails/rails@aafee23
f9b439b
to
e21d1b9
Compare
@Fryguy Please review |
Some comments on commit jrafanie@e21d1b9 spec/models/miq_queue_worker_base/runner_spec.rb
|
Checked commit jrafanie@e21d1b9 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 app/models/miq_queue_worker_base/runner.rb
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -64,6 +64,7 @@ def get_message_via_drb | |||
_log.debug("#{log_prefix} #{MiqQueue.format_short_log_msg(msg)} stale, retrying...") | |||
next | |||
rescue => err | |||
msg.update_column(:state, MiqQueue::STATUS_ERROR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this fails? Say, if the database is down? should we do some kind of rescue nil
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is needed.
If we have got here, the database was running a second ago. If it is no longer running, we cannot do anything sensible at that point, thus we can fail horribly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So @Fryguy I was debating if I should just delete the row if this fails but I really can't understand how this line could throw an error if we retrieved the msg
on line 45. We're bypassing all other columns and validations. If we fail here, the only recourse is to raise. Who knows if a delete of the row would work either.
Ok, this is ready to go and is also for backport. Thanks for the review! |
https://bugzilla.redhat.com/show_bug.cgi?id=1434454 The PostgreSQL::OID::Integer class was removed in: rails/rails@aafee23 It's possible that old Rails 4.2 versions of objects could have been serialized in the MiqQueue in the args column and we won't be able to deserialize them with Rails 5+, so we need to remove these rows. Related to ManageIQ#14365 Related to https://bugzilla.redhat.com/show_bug.cgi?id=1429747
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
If we can't update_attributes on a queue row, set state to error (cherry picked from commit 90cd1b0) https://bugzilla.redhat.com/show_bug.cgi?id=1436854
Euwe backport details:
|
https://bugzilla.redhat.com/show_bug.cgi?id=1434454 The PostgreSQL::OID::Integer class was removed in: rails/rails@aafee23 It's possible that old Rails 4.2 versions of objects could have been serialized in the MiqQueue in the args column and we won't be able to deserialize them with Rails 5+, so we need to remove these rows. Related to ManageIQ/manageiq#14365 Related to https://bugzilla.redhat.com/show_bug.cgi?id=1429747 (transferred from ManageIQ/manageiq@9795934)
https://bugzilla.redhat.com/show_bug.cgi?id=1429747
In the reported bug, we had a Rails 4.2 era class [1] serialized in the args
column of a miq_queue row. This class was removed in rails 5.0.0 [2],
so we'd be unable to deserialize the column with:
ArgumentError: undefined class/module ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Integer
If we're unable to update_attributes because a column can't be
deserialized, the message can't be handled by a worker, the worker dies,
and the message remains in the miq_queue for another worker to try and
also fail on.
Instead, if update_attributes fails, we can try to set just the state
column to 'error'. In this way, the server will not try to dispatch the
same queue multiple times. We clear errored messages at server boot, so
we can clean them up then.
[1] ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Integer
[2] rails/rails@aafee23