Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove queue serialization #13722

Merged
merged 1 commit into from
Mar 16, 2017
Merged

Conversation

moolitayer
Copy link

@moolitayer moolitayer commented Feb 1, 2017

Fix #13701

@moolitayer
Copy link
Author

@miq-bot add_label wip, smart state, bug, providers/containers

@@ -112,7 +112,7 @@ def fetch_oscap_arf
@job.signal(:data, '<summary><syncmetadata></syncmetadata></summary>')
end

@job = @ems.raw_scan_job_create(@image)
@job = @ems.raw_scan_job_create(@image.class.name, @image,id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you typo-ed a comma for a point ( @image.id ) and this might also be the reason the tests are failing

Copy link

@enoodle enoodle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just fix the tests..

@moolitayer moolitayer force-pushed the remove_serialization branch from 39f35c2 to 36da936 Compare February 1, 2017 14:37
@moolitayer
Copy link
Author

moolitayer commented Feb 1, 2017

😕 still failing localy... I will look into it when I have some more time

@chessbyte chessbyte requested a review from simon3z February 3, 2017 20:24
@moolitayer moolitayer force-pushed the remove_serialization branch 2 times, most recently from c2dcf09 to f763033 Compare February 13, 2017 18:30
@moolitayer moolitayer changed the title [WIP] Remove queue serialization Remove queue serialization Feb 13, 2017
@moolitayer
Copy link
Author

@miq-bot remove_label wip

@miq-bot miq-bot removed the wip label Feb 13, 2017
@moolitayer
Copy link
Author

moolitayer commented Feb 13, 2017

@cben @kbrock please review this fix if you have time

verified it with comparing the output below for master Vs this branch

2.3.0 :002 > ContainerImage.first.scan # console
$ grep "ruby/object:ContainerImage" log/development.log

Copy link
Member

@kbrock kbrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an extra lookup, and I'm not exactly sure what the change in method signature buys us

:zone => my_zone,
:miq_server_host => MiqServer.my_server.hostname,
:miq_server_guid => MiqServer.my_server.guid,
:ems_id => id,
:ems_id => class_name.constantize.find_by(:id => id).try(:ext_management_system),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh. can this lookup be avoided?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is a separate problem, i'll take it out of here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it in. You are saying it is a separate bug. I get it.
Can you use a where and pluck? Will this object always belong to an ems/have the ems_id column?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it save a lookup if you passed this value in?

end

def raw_scan_job_create(entity)
def raw_scan_job_create(class_name, id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did the separation here buy you?

I understand that objects don't want to go into the MiqQueue or Job.create_job. I don't see any invocations to :raw_scan_job_create going onto the stack

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my I didn't see the serialization in the log only because I caused the job not not run anymore 😮 I will dig dipper

@@ -124,31 +124,31 @@ def default_authentication_type
end

def scan_job_create(entity)
check_policy_prevent(:request_containerimage_scan, entity, :raw_scan_job_create, entity)
check_policy_prevent(:request_containerimage_scan, entity, :raw_scan_job_create)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_scan_job_create now takes entity.class and entity.id. no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, this is changing parameters to check_policy_prevent

Is this related?

Copy link
Contributor

@simon3z simon3z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moolitayer @cben @kbrock is there any backward-compatibility consideration here?

What about jobs that were created before this code change (previous version)?
Do we need a migration?

@miq-bot
Copy link
Member

miq-bot commented Mar 6, 2017

This pull request is not mergeable. Please rebase and repush.

@moolitayer
Copy link
Author

@moolitayer @cben @kbrock is there any backward-compatibility consideration here?

What about jobs that were created before this code change (previous version)?
Do we need a migration?

@simon3z Do you mean old calls placed on the queue?
even if there is nothing the empties the queue prior to upgrade I would just let those fail. Honestly I don't think it is worth the time.

@moolitayer
Copy link
Author

moolitayer commented Mar 6, 2017

@kbrock could you please review?
I need to fix the tests, sorry

@moolitayer moolitayer force-pushed the remove_serialization branch from af9fff6 to fd8a3ac Compare March 6, 2017 18:14
@moolitayer
Copy link
Author

@kbrock safe to review now

@@ -153,12 +153,12 @@ def scan_job_create(entity)
check_policy_prevent(:request_containerimage_scan, entity, :raw_scan_job_create)
end

def raw_scan_job_create(entity)
def raw_scan_job_create(target_class, target_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps name the arg "target_class_name".
YMMV but when I hear "class" I always expect a class object.

@@ -174,7 +174,7 @@ def check_policy_prevent(policy_event, event_target, cb_method)
:class_name => self.class.to_s,
:instance_id => id,
:method_name => :check_policy_prevent_callback,
:args => [cb_method, event_target],
:args => [cb_method, event_target.class.name, event_target.id],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at other check_policy_prevent functions (host, vm_or_template), they don't take target, just *cb_method and use [*cb_method] here.
You could the same and then scan_job_create would have to do

     check_policy_prevent(:request_containerimage_scan, :raw_scan_job_create, entity.class.name, entity.id)

I like that better because one can understand scan_job_create -> raw_scan_job_create path without looking into the policy methods.
Plus less obstacles to dedup these functions one day... (AFAICT the main diffrence is in building inputs hashes for MiqEvent.raise_evm_event.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @cben here. And, unlike those other poorly documented examples, you could extend your useful comment here to indicate that *cb_method includes the callback method name (symbol) and callback method args.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cben So you are suggesting accepting any input and copying them straight to the queue?

I view the purpose of methods like these to dictate and protect the messages that go onto the queue.

If you are going to document what parameters are going into *cb_method then why not just document it by having actual variables?

But I was having trouble understanding the full implications of comment from @cben - sorry if I totally botched it.

Copy link
Contributor

@cben cben Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view the purpose of methods like these to dictate and protect the messages that go onto the queue.

Good point. The ability to understand scan_job_create -> raw_scan_job_create without thinking about how check_policy_prevent() works might actually be a drawback :-|
However, if you read the following, without knowing what exactly check_policy_prevent() does:

   def scan_job_create(entity)
     check_policy_prevent(:request_containerimage_scan,
                          :raw_scan_job_create, entity.class.name, entity.id)
   end

isn't it "obvious" that a raw_scan_job_create(name, id) callback is being queued?
(What one misses by not looking into check_policy_prevent() is that there is another level of queuing first.)
And if not obvious, could it be improved by naming (e.g. "queue_if_policy_doesnt_prevent" or something)?

P.S. I wish there was a way to actually prevent AR serializion, instead of constrained coding style that "dictates and protects"... I know it's a hard problem.

If you are going to document what parameters are going into *cb_method then why not just document it by having actual variables?

👍 The function is trying to be generic, but could at least take ..., cb_method, *cb_args.

Anyway, feel free to override me. My suggestion was purely cosmetic.
E.g. @simon3z's concern about migration, if it conflicts, is more important.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. I wish there was a way to actually prevent AR serializion, instead of constrained coding style that "dictates and protects"... I know it's a hard problem.

would having a (dev mode?) functionality of preventing args of a type (application record?) from going in the queue work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moolitayer Does check_policy_prevent_callback work?

I think I'm misreading the code before your change:

    cb = {
      :class_name  => self.class.to_s,
      :instance_id => id,
      :method_name => :check_policy_prevent_callback,
      :args        => [cb_method, event_target],
      :server_guid => MiqServer.my_guid
    }

def check_policy_prevent_callback(*action, _status, _message, result)
end

I don't see how this callback can produce an action, status, message, and result. I only see 2 arguments. None of which are event_target. I must be mistaking how callback works.

I expected something closer to :args => [*cb_method, nil, nil, event_target]

Then this PR would proceed by changing event_target into class/id...

Copy link
Author

@moolitayer moolitayer Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbrock I hope I understand your comment

Turns out in ruby:

def strange_example(*a,b,c)
  puts "[#{a}] [#{b}] [#{c}]"
end

> strange_example(1,2,3,4)
[[1, 2]] [3] [4]

> strange_example(1,2)
[[]] [1] [2]

Also I tested this code (which wasn't introduced here).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbrock actually that wasn't the correct answer,
This is where the parameters for the callback are built:

cb_args = miq_callback[:args] + [state, msg, result]

you wrote that line actually :bowtie:

Copy link
Member

@kbrock kbrock Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UGH - totally

(to be fair, I only modified that line. but sure enough, my name is on it)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I wan't saying you should have remembered it since 2015 😄 just commenting on the coincidence ⭐

@blomquisg
Copy link
Member

blomquisg commented Mar 6, 2017

Overall this looks good to me. I agree with @cben's comments. If those are addressed, this is good to merge.

I'll give a preemptive approval 😸

Copy link
Contributor

@simon3z simon3z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simon3z Do you mean old calls placed on the queue?
even if there is nothing the empties the queue prior to upgrade I would just let those fail. Honestly I don't think it is worth the time.

@blomquisg are you OK also with the behavior described above? (Related to upgrades)

@moolitayer so how these would fail? How would it be recognizable from other real failures so that if GSS is debugging issues on upgrades (which sadly in general happen) they won't be confused by this?

What you're suggesting is risky approach ("just let them fail") if we don't know exactly how they fail, how to recognize this particular case, etc.

If we're sure the queue is empty on upgrades then we could skip this, although I am not a fan of changing the formats on DB without a migration.

@moolitayer
Copy link
Author

@moolitayer so how these would fail? How would it be recognizable from other real failures so that if GSS is debugging issues on upgrades (which sadly in general happen) they won't be confused by this?

If someone schedules a scan and then shuts down and upgrades at exactly the right moment there will be one failed job with an error from:

return queue_signal(:abort_job, "no image found", "error") unless image

saying "no image found"

Waiting for feedback from @blomquisg for now

@blomquisg
Copy link
Member

@simon3z said:

@blomquisg are you OK also with the behavior described above? (Related to upgrades)

That's a good catch @simon3z!

If someone schedules a scan and then shuts down and upgrades at exactly the right moment there will be one failed job with an error from:

return queue_signal(:abort_job, "no image found", "error") unless image

saying "no image found"

I think this is one possibility. I think it's also possible, depending on how far the processing got, you might see a ArgumentError: wrong number of arguments (2 for 1) if it got to the point of trying to call the callback method raw_scan_job_create with just the entity instead of the class and id. So this could look really bad, even if it is just a simple job failure that needs to be requeued.

The idealist in me says that these should be migrated. It probably largely depends on how frequently these jobs get executed. But, to avoid the customer confusion, it's probably best to migrate.

Copy link
Member

@blomquisg blomquisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @simon3z's comment, changing my review.

@moolitayer
Copy link
Author

Did some digging... after not being able to catch the callback we would need to migrate[1] on the queue for a while I found out that that call is synchronous[2] which was surprising. If that is indeed the case (quite possible I missed something) we have nothing to migrate.

In case I'm wrong, a suggestion to have in mind is to fix this with a one liner in raw_scan_job_create:

target_class, target_id = target_class.class, target_class.id if target_class.is_a?(ContainerNode)

@simon3z Please comment, thanks

[1] afaiu check_policy_prevent_callback called back from automate - from there we only have a direct send

[2]

obj.send(miq_callback[:method_name], *cb_args)

@moolitayer
Copy link
Author

ping @simon3z

@kbrock
Copy link
Member

kbrock commented Mar 9, 2017

@moolitayer for your sample code make sure you change your method signature to support 1 or 2 params:

raw_scan_job_create(target_class, target_id = nil)

@moolitayer
Copy link
Author

@simon3z please take a look at this comment

@simon3z
Copy link
Contributor

simon3z commented Mar 13, 2017

@simon3z please take a look at this comment

@moolitayer so we're just writing these args to the database and never read/use them?

@moolitayer
Copy link
Author

@simon3z right, come to think of it if automate wasn't called yet we have a call on the queue with a bad callback attached to it. Are you ok with the solution suggested above:

+ raw_scan_job_create(target_class, target_id = nil)
+   target_class, target_id = target_class.class, target_class.id if target_class.is_a?(ContainerNode)

@simon3z
Copy link
Contributor

simon3z commented Mar 13, 2017

Are you ok with the solution suggested above:

@moolitayer LGTM 👍

@moolitayer moolitayer force-pushed the remove_serialization branch from fd8a3ac to 548234a Compare March 13, 2017 16:15
@moolitayer moolitayer force-pushed the remove_serialization branch from 548234a to ad6ad1c Compare March 13, 2017 16:35
@miq-bot
Copy link
Member

miq-bot commented Mar 13, 2017

Checked commit moolitayer@ad6ad1c with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
5 files checked, 0 offenses detected
Everything looks good. 🏆

@moolitayer
Copy link
Author

@simon3z @blomquisg ready to merge?

@simon3z
Copy link
Contributor

simon3z commented Mar 14, 2017

LGTM 👍 ready for merge
cc @blomquisg

@@ -153,12 +153,13 @@ def scan_job_create(entity)
check_policy_prevent(:request_containerimage_scan, entity, :raw_scan_job_create)
end

def raw_scan_job_create(entity)
def raw_scan_job_create(target_class, target_id = nil)
target_class, target_id = target_class.class.name, target_class.id if target_class.kind_of?(ContainerImage)
Copy link
Member

@kbrock kbrock Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.
This allows us to be backward compatible and to not require a migration.

@kbrock
Copy link
Member

kbrock commented Mar 16, 2017

LGTM 👍 ready for merge
cc @blomquisg

@blomquisg blomquisg merged commit 3b7fd39 into ManageIQ:master Mar 16, 2017
@blomquisg blomquisg added this to the Sprint 57 Ending Mar 27, 2017 milestone Mar 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants