Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If we can't update_attributes on a queue row, set state to error #14365

Merged
merged 1 commit into from
Mar 28, 2017

Conversation

jrafanie
Copy link
Member

https://bugzilla.redhat.com/show_bug.cgi?id=1429747

In the reported bug, we had a Rails 4.2 era class [1] serialized in the args
column of a miq_queue row. This class was removed in rails 5.0.0 [2],
so we'd be unable to deserialize the column with:

ArgumentError: undefined class/module ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Integer

If we're unable to update_attributes because a column can't be
deserialized, the message can't be handled by a worker, the worker dies,
and the message remains in the miq_queue for another worker to try and
also fail on.

Instead, if update_attributes fails, we can try to set just the state
column to 'error'. In this way, the server will not try to dispatch the
same queue multiple times. We clear errored messages at server boot, so
we can clean them up then.

[1] ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Integer
[2] rails/rails@aafee23

@@ -35,7 +35,7 @@ def thresholds_exceeded?
def get_message_via_drb
loop do
begin
msg_id, lock_version = @worker_monitor_drb.get_queue_message(@worker.pid)
msg_id, lock_version = worker_monitor_drb.get_queue_message(@worker.pid)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a attr_reader on the @worker_monitor_drb so i'm using it here so it's easier to test.

https://bugzilla.redhat.com/show_bug.cgi?id=1429747

In the reported bug, we had a Rails 4.2 era class [1] serialized in the args
column of a miq_queue row.  This class was removed in rails 5.0.0 [2],
so we'd be unable to deserialize the column with:

ArgumentError: undefined class/module ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Integer

If we're unable to update_attributes because a column can't be
deserialized, the message can't be handled by a worker, the worker dies,
and the message remains in the miq_queue for another worker to try and
also fail on.

Instead, if update_attributes fails, we can try to set just the state
column to 'error'.  In this way, the server will not try to dispatch the
same queue multiple times.  We clear errored messages at server boot, so
we can clean them up then.

[1] ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Integer
[2] rails/rails@aafee23
@jrafanie jrafanie force-pushed the dont_retry_failed_message branch from f9b439b to e21d1b9 Compare March 17, 2017 15:27
@jrafanie
Copy link
Member Author

@Fryguy Please review

@miq-bot
Copy link
Member

miq-bot commented Mar 17, 2017

Some comments on commit jrafanie@e21d1b9

spec/models/miq_queue_worker_base/runner_spec.rb

  • ⚠️ - 14 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.
  • ⚠️ - 6 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.
  • ⚠️ - 7 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.
  • ⚠️ - 8 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.

@miq-bot
Copy link
Member

miq-bot commented Mar 17, 2017

Checked commit jrafanie@e21d1b9 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
3 files checked, 1 offense detected

app/models/miq_queue_worker_base/runner.rb

Copy link
Member

@isimluk isimluk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -64,6 +64,7 @@ def get_message_via_drb
_log.debug("#{log_prefix} #{MiqQueue.format_short_log_msg(msg)} stale, retrying...")
next
rescue => err
msg.update_column(:state, MiqQueue::STATUS_ERROR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this fails? Say, if the database is down? should we do some kind of rescue nil here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is needed.

If we have got here, the database was running a second ago. If it is no longer running, we cannot do anything sensible at that point, thus we can fail horribly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @Fryguy I was debating if I should just delete the row if this fails but I really can't understand how this line could throw an error if we retrieved the msg on line 45. We're bypassing all other columns and validations. If we fail here, the only recourse is to raise. Who knows if a delete of the row would work either.

@jrafanie
Copy link
Member Author

Ok, this is ready to go and is also for backport. Thanks for the review!

jrafanie added a commit to jrafanie/manageiq that referenced this pull request Mar 21, 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1434454

The PostgreSQL::OID::Integer class was removed in:
rails/rails@aafee23

It's possible that old Rails 4.2 versions of objects could have been
serialized in the MiqQueue in the args column and we won't be able to
deserialize them with Rails 5+, so we need to remove these rows.

Related to ManageIQ#14365
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1429747
Copy link
Member

@gtanzillo gtanzillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@gtanzillo gtanzillo added this to the Sprint 57 Ending Mar 27, 2017 milestone Mar 28, 2017
@gtanzillo gtanzillo merged commit 90cd1b0 into ManageIQ:master Mar 28, 2017
simaishi pushed a commit that referenced this pull request Mar 28, 2017
If we can't update_attributes on a queue row, set state to error
(cherry picked from commit 90cd1b0)

https://bugzilla.redhat.com/show_bug.cgi?id=1436854
@simaishi
Copy link
Contributor

Euwe backport details:

$ git log -1
commit 81ccccc3b774b109008650456355b6668a36d9fe
Author: Gregg Tanzillo <[email protected]>
Date:   Mon Mar 27 22:02:48 2017 -0400

    Merge pull request #14365 from jrafanie/dont_retry_failed_message
    
    If we can't update_attributes on a queue row, set state to error
    (cherry picked from commit 90cd1b0f7648f1246631aeb7f0eac18d2b0b6818)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1436854

@jrafanie jrafanie deleted the dont_retry_failed_message branch April 4, 2017 19:05
Fryguy pushed a commit to ManageIQ/manageiq-schema that referenced this pull request Jun 20, 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1434454

The PostgreSQL::OID::Integer class was removed in:
rails/rails@aafee23

It's possible that old Rails 4.2 versions of objects could have been
serialized in the MiqQueue in the args column and we won't be able to
deserialize them with Rails 5+, so we need to remove these rows.

Related to ManageIQ/manageiq#14365
Related to https://bugzilla.redhat.com/show_bug.cgi?id=1429747

(transferred from ManageIQ/manageiq@9795934)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants