Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow limit does not limit on AWX 21.6.0 #12991

Open
3 of 9 tasks
Quiwy opened this issue Oct 3, 2022 · 6 comments
Open
3 of 9 tasks

Workflow limit does not limit on AWX 21.6.0 #12991

Quiwy opened this issue Oct 3, 2022 · 6 comments

Comments

@Quiwy
Copy link

Quiwy commented Oct 3, 2022

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.

Bug Summary

Hi, I'm not really sure if this is a bug or if it's the normal behavior, but I can't really find what is the expected behavior in the documentation.

I'm creating a Workflow Job Template with a limit set on a specific host, calling a job template which has a limit set on a group.
The job template is then applied to the all group (which is very surprising as you have defined a more restricted set just before when launching the workflow).

This behavior does not occur when the job template has the checkbox “prompt on launch” checked. I guess it's the same problem as #11967 which was closed as being marked as a duplicate of #11284 . I guess the explanation which is the best according to my case is the one on this comment : #11284 (comment) .

AWX version

21.6.0

Select the relevant components

  • UI
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

minikube

Modifications

no

Ansible version

2.15.5

Operating system

Debian 11

Web browser

No response

Steps to reproduce

In case it's not the same, I will explain the steps I did to face this issue :

  1. Create a group in an inventory with various hosts
    image

  2. Create a job template with a limit on the group with the prompt on launch checkbox unchecked
    image

  3. Launch the job template
    image

  4. Create a workflow job template with a limit set to whichever host you want
    image

  5. Set a node calling the job template
    image

Actual results

Looks like it's limited to localhost
image

  1. This is where it's beginning to be strange to me and does not really make sense : the job template was applied to the limit defined in the job template, and not the one in the workflow template
    image

Expected results

Whereas if on step 2 I check the prompt on launch checkbox and follow step 3/4/5 the workflow was launched on the limit defined on the workflow template job (which is what I want).
image

image

I except to have this result whether the checkbox is checked or not, or at least a warning telling me it will not limit to what I wrote in the limit text.

Additional information

I have found the following comment on another issue which says

As noted in the docs, the workflow limit "provide[s] a host pattern to further constrain the list of hosts that will be managed or affected by the workflow. This limit is applied to all job template nodes that prompt for a limit."

I've read the documentation on https://docs.ansible.com/ansible-tower/3.8.0/html/userguide/workflow_templates.html and I can't find this information. Where is it located?

This strange behavior lead us to delete half of our application instance when we wanted to delete only one (which was defined in limit text on workflob template job). Maybe a clear part on the documentation, or a warning (or something else? I have no idea what) telling when creating the workflow job template that the limit will not be applied would be best.

I'm not really sure this is really clear, let me know if you need some more information.

@fosterseth
Copy link
Member

sounds similar to #8561

@Quiwy
Copy link
Author

Quiwy commented Oct 14, 2022

sounds similar to #8561

Thank you for your answer @fosterseth

I tried the different steps to reproduce :

STEPS TO REPRODUCE

Create:

AWX job_template including a LIMIT (during execution, the job_template respects the LIMIT without "prompt on launch" as expected). The job_type does not matter here.
AWX workflow_template without LIMIT and add the job_template (still including the original LIMIT without "prompt on launch") to the workflow_template.

Execute the workflow_template as follows:

workflow_template run without LIMIT, job_template with LIMIT (no prompt on launch): job_template LIMIT respected
set job_template LIMIT checkbox for "prompt on launch"
workflow_template run without LIMIT, job_template with LIMIT (including prompt on launch): job_template LIMIT respected
workflow_template run with LIMIT (change to some value), job_template with LIMIT (including prompt on launch): workflow_template LIMIT respected
change workflow_template LIMIT back to empty and save workflow_template again
workflow_template run without LIMIT (again!), job template with LIMIT (including prompt on launch): LIMIT set in job_template ignored, playbook is executed on all inventory targets. This means that the workflow_template ignores the LIMIT set in the job_template which it did not do before.

And I can't reproduce the behavior of #8561. Maybe I'm making it the wrong way 🤔 .

In my case, the only important point is the "prompt on launch" of the job template.

If I try to summarize in a table :

Limit on workflow template Limit on job template "Prompt on launch" on job template checked Actual result Expected limitation result of job template called by workflow template Seems OK
host1 group1   action on all group1 action on host1
host1 group1 action on host1 action on host1 ✔️
group1 host1 action on host1 action on host1 ✔️
group1 host1 action on all group1 action on host1
group1 no limit action on group1 action on group1 ✔️
group1 no limit action on all inventory action on group1  ❌ (this is the worst unwanted case)

I hope it's clearer this way :)

@dustinmhorvath
Copy link

I'm basically still running into the issues from #9377. My experience is that limits surrounding workflows are EXTREMELY flaky and can't be trusted without copious testing.

I can do the following:

  1. Create workflow with prompt-limit=yes, limit is empty
  2. Add job template node with prompt-limit=yes, limit=someGroup
  3. Verify workflow visualizer shows job template node with the selected limit
  4. Execute workflow as depicted

This workflow will run against the ENTIRE inventory. The limits that the UI will show you are a bald-faced lie directly to your face. It simply will execute differently than you've explicitly told it, full stop.

What's more, trying to configure limits via the API is equally dodgy. I can create a workflow using the API with limit: "", and the actual limit will be "". BUT, if I do literally the exact same thing via the UI, if I go into the workflow and hit "edit" -> "save", that limit: "" becomes limit: NULL, which behaves magically, entirely differently.

NONE of this hidden magic is communicated to the user without actually talking to the API directly somehow, and essentially comparing before and after hitting "edit -> save", because nothing actually changes at all int he UI, it's just changing magic hidden values behind the scene and interpreting them in completely opposite ways.

I would consider this an extremely huge bug, which has threatened to bring down big chunks of our infrastructure at unwanted times, and has only avoided disaster in our environment so far because I know that AWX is a filthy liar and its UI can't be trusted worth a damn. And so I watch scheduled jobs like a hawk to ensure it's not doing the completely opposite thing that it's template explicitly claims it will do.

My advice to new users of AWX (who might be finding this thread) would be to avoid using prompted limits in any job templates involved in a workflow template entirely, because the behavior is unreliable and unpredictable. I know that might be annoying to hear, because that's literally the whole point of having workflow templates, is that the inventories and limits of the nodes in the workflow are different.

@dustinmhorvath
Copy link

Ultimately imo, this whole limit: null business needs to be completely abandoned, because it shows the user two different things, hidden from them, that behave wildly different, and arbitrarily. The way that limit: "" is interpreted should simply be extended to cover the existing use-cases, with a simple and straightforward description of how it behaves.

The previous issue on this topic !3223 worries about limits behaving differently in AWX than in ansible-core, which is a silly comparison. You should worry about the job template limit being respected as much as you can, since that's analogous to a playbook executing, but a workflow template has little to do with Ansible at all, it's purely an abstraction in AWX, and can be handled in any sane way that covers end-users' use-cases.

@anxstj
Copy link
Contributor

anxstj commented Feb 10, 2023

I agree that limits in workflow templates are flaky and eventually be dangerous.

From my perspective, there are 4 sources for the limit's value (in the order of precedence):

  • job template default (possible values: string or "")
  • job template node prompt (possible values: string or "")
  • workflow template default (possible values: string or null)
  • workflow template prompt (possible values: string or "")

An empty string ("") means that no limit will be applied and a null means that a limit of lower precedence will be applied.

At a workflow template prompt a user can't set the limit back to null which is needed to apply the limit of lower precedence, e.g. the job template default value. Instead, leaving the prompt empty would remove the limit entirely and execute the job on all nodes.

Instead of differentiating between "" and null the user needs a flag (check box) that enables/disables the limit box. And if the prompt for the limit is disabled, then that flag would always be disabled. That way the user wouldn't have the chance to enter a limit that isn't respected at all.

@raptaml
Copy link

raptaml commented Apr 25, 2024

#9377 (comment)
is way to dangerous IMHO.
Parent workflow should always enforce limit to child jobs if set. No matter if child asks for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants