Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EmbeddedAnsible] Force embedded_ansible role for workflow #19187

Conversation

NickLaMuro
Copy link
Member

The queue_signal method in AnsibleRunnerWorkflow forces the "ems_operations" role when it schedules a queue item, but this class (which inherits from AnsibleRunnerWorkflow) is getting assigned this job from a wrapper job that requires the embedded_ansible role. In addition, the previous job queue an job that is locked to the existing server guid, so it is possible for that server to take the first job, but not the second when it doesn't have both an "embedded_ansible" and "ems_operations" role.

When a server exists that only has the "embedded_ansible" role, it is possible to get into a state where a playbook can be scheduled, but then is never ran because no server matches. This fix simply always uses the "embedded_ansible" role for everything, but tries to only modify the lower level classes to achieve that.

Alternative Solution

There is a "cleaner" solution for this that would instead modify the AnsibleRunnerWorkflow#queue_signal:

to instead just default to "embedded_ansible" instead of "ems_operations". However, since I am not confident that this is only use for "embedded ansible", I favored this method. While much uglier, it felt a bit safer at this stage in the game. I would agree that we should probably not go with this solution long term, but for a quick fix, I would argue this is a bit safer.

Links

Steps for Testing/QA

Provided replication steps in the BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1742839#c7

And this can be easily replicated on a single appliance.

The `queue_signal` method in `AnsibleRunnerWorkflow` forces the
"ems_operations" role when it schedules a queue item, but this class
(which inherits from `AnsibleRunnerWorkflow`) is getting assigned this
job from a wrapper job that requires the `embedded_ansible` role.  In
addition, the previous job queue an job that is locked to the existing
server guid, so it is possible for that server to take the first job,
but not the second when it doesn't have both an "embedded_ansible" and
"ems_operations" role.

When a server exists that only has the "embedded_ansible" role, it is
possible to get into a state where a playbook can be scheduled, but then
is never ran because no server matches.  This fix simply always uses the
"embedded_ansible" role for everything, but tries to only modify the
lower level classes to achieve that.

Fixes:  https://bugzilla.redhat.com/show_bug.cgi?id=1742839
@miq-bot
Copy link
Member

miq-bot commented Aug 22, 2019

Checked commit NickLaMuro@78f9fe7 with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 0 offenses detected
Everything looks fine. 🍪

@bdunne bdunne requested a review from Fryguy August 23, 2019 13:36
@Fryguy Fryguy self-assigned this Aug 26, 2019
@Fryguy
Copy link
Member

Fryguy commented Aug 26, 2019

The various workflow classes should not default to any role so I don't think this direction works either. The main problem with the current code is that the caller defaults to ems_operations, since it was originally designed for providers. In time, we expect all callers (both provider, embedded ansible or other) to go through these workflow classes, so defaulting a role is not correct.

@NickLaMuro
Copy link
Member Author

The various workflow classes should not default to any role so I don't think this direction works either.

Does not work in what way? I tested against these replication steps:

https://bugzilla.redhat.com/show_bug.cgi?id=1742839#c7

And with the fix in place, the playbook worked as expected. Is there something else that I am not accounting for that you are talking about?

@Fryguy
Copy link
Member

Fryguy commented Aug 26, 2019

When another non-embedded_ansible thing uses this class in the future?

@NickLaMuro
Copy link
Member Author

When another non-embedded_ansible thing uses this class in the future?

@Fryguy okay, sure, but that wouldn't that be made even worse if I were to set this in the super class where the role defaults to "ems_operations" currently:

role = options[:role] || "ems_operations"

So this was an attempt to apply this change where it would have the least number of side effects, specifically address what I think you are asking about above. However, if you think this isn't going far enough, I can see two alternatives:

  • Raise an error for when this is used by a class that shouldn't, though not sure how to accomplish that though...
  • Create another subclass that then is only used by EmbeddedAnsible

That said, along with having more lines of code to accomplish the second option when compared to this current solution, it also seems unnecessary at this point since only EmbeddedAnsible is the only user of this class (from what I can tell):

https://github.com/search?q=org%3AManageIQ+AnsiblePlaybookWorkflow&type=Code

So not sure what else I could do.

@Fryguy
Copy link
Member

Fryguy commented Aug 29, 2019

Going to merge for now to get it working. I think the best approach moving forward is something a bit more robust with having queue_signal more generic and moved into Job itself, but I need to consider the design of that a bit more.

@Fryguy Fryguy merged commit b74fd8d into ManageIQ:master Aug 29, 2019
@Fryguy Fryguy added this to the Sprint 119 Ending Sep 2, 2019 milestone Aug 29, 2019
simaishi pushed a commit that referenced this pull request Sep 25, 2019
…le_runner_embedded_ansible_jobs

[EmbeddedAnsible] Force embedded_ansible role for workflow

(cherry picked from commit b74fd8d)

https://bugzilla.redhat.com/show_bug.cgi?id=1742839
@simaishi
Copy link
Contributor

Ivanchuk backport details:

$ git log -1
commit 9d24747ca9a3027b12de4b1f29ffdb1c56be580e
Author: Jason Frey <[email protected]>
Date:   Thu Aug 29 15:20:00 2019 -0400

    Merge pull request #19187 from NickLaMuro/fix_role_mismatch_for_ansible_runner_embedded_ansible_jobs
    
    [EmbeddedAnsible] Force embedded_ansible role for workflow
    
    (cherry picked from commit b74fd8df9546db3b4bbd1273b59719c7791f9449)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1742839

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants