-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EmbeddedAnsible] Force embedded_ansible role for workflow #19187
[EmbeddedAnsible] Force embedded_ansible role for workflow #19187
Conversation
The `queue_signal` method in `AnsibleRunnerWorkflow` forces the "ems_operations" role when it schedules a queue item, but this class (which inherits from `AnsibleRunnerWorkflow`) is getting assigned this job from a wrapper job that requires the `embedded_ansible` role. In addition, the previous job queue an job that is locked to the existing server guid, so it is possible for that server to take the first job, but not the second when it doesn't have both an "embedded_ansible" and "ems_operations" role. When a server exists that only has the "embedded_ansible" role, it is possible to get into a state where a playbook can be scheduled, but then is never ran because no server matches. This fix simply always uses the "embedded_ansible" role for everything, but tries to only modify the lower level classes to achieve that. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1742839
Checked commit NickLaMuro@78f9fe7 with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0 |
The various workflow classes should not default to any role so I don't think this direction works either. The main problem with the current code is that the caller defaults to ems_operations, since it was originally designed for providers. In time, we expect all callers (both provider, embedded ansible or other) to go through these workflow classes, so defaulting a role is not correct. |
Does not work in what way? I tested against these replication steps: https://bugzilla.redhat.com/show_bug.cgi?id=1742839#c7 And with the fix in place, the playbook worked as expected. Is there something else that I am not accounting for that you are talking about? |
When another non-embedded_ansible thing uses this class in the future? |
@Fryguy okay, sure, but that wouldn't that be made even worse if I were to set this in the super class where the role defaults to
So this was an attempt to apply this change where it would have the least number of side effects, specifically address what I think you are asking about above. However, if you think this isn't going far enough, I can see two alternatives:
That said, along with having more lines of code to accomplish the second option when compared to this current solution, it also seems unnecessary at this point since only https://github.com/search?q=org%3AManageIQ+AnsiblePlaybookWorkflow&type=Code So not sure what else I could do. |
Going to merge for now to get it working. I think the best approach moving forward is something a bit more robust with having queue_signal more generic and moved into Job itself, but I need to consider the design of that a bit more. |
…le_runner_embedded_ansible_jobs [EmbeddedAnsible] Force embedded_ansible role for workflow (cherry picked from commit b74fd8d) https://bugzilla.redhat.com/show_bug.cgi?id=1742839
Ivanchuk backport details:
|
The
queue_signal
method inAnsibleRunnerWorkflow
forces the "ems_operations" role when it schedules a queue item, but this class (which inherits fromAnsibleRunnerWorkflow
) is getting assigned this job from a wrapper job that requires theembedded_ansible
role. In addition, the previous job queue an job that is locked to the existing server guid, so it is possible for that server to take the first job, but not the second when it doesn't have both an "embedded_ansible" and "ems_operations" role.When a server exists that only has the "embedded_ansible" role, it is possible to get into a state where a playbook can be scheduled, but then is never ran because no server matches. This fix simply always uses the "embedded_ansible" role for everything, but tries to only modify the lower level classes to achieve that.
Alternative Solution
There is a "cleaner" solution for this that would instead modify the
AnsibleRunnerWorkflow#queue_signal
:to instead just default to "embedded_ansible" instead of "ems_operations". However, since I am not confident that this is only use for "embedded ansible", I favored this method. While much uglier, it felt a bit safer at this stage in the game. I would agree that we should probably not go with this solution long term, but for a quick fix, I would argue this is a bit safer.
Links
Steps for Testing/QA
Provided replication steps in the BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1742839#c7
And this can be easily replicated on a single appliance.