Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: adjust incorrectly passed keywords with exclude-strings argument #15721

Merged

Conversation

Sasa993
Copy link
Contributor

@Sasa993 Sasa993 commented Dec 20, 2024

SUMMARY

This PR addresses an issue with how the --exclude-strings argument is passed to ansible-runner during the cleanup task.

Issue

When --exclude-strings is passed with quotes, such as:

  • --exclude-strings="awx_1, awx_2"
  • --exclude-strings="awx_1 awx_2"

The argument is treated as a single string, and ansible-runner receives it as: ["awx_1 awx_2"] or ["awx_1, awx_2"].
This causes the exclusion logic to fail and as a result, directories or files specified in --exclude-strings are not excluded from cleanup.

Impact

Due to this bug, the ansible-runner worker cleanup command removes directories and files that are intended to be excluded by the --exclude-strings argument.
In our case, this bug results in running jobs being killed by the cleanup task triggered by another job on the same execution node.
I suspect this issue may have contributed to the "JSON error" problems that have been frequently reported. (like 14693, 15657, 15122)

Expected Behavior:

When --exclude-strings is passed without quotes and the = sign, like: --exclude-strings awx_1 awx_2, ansible-runner correctly parses it into a list of separate strings: ["awx_1", "awx_2"].
This allows the cleanup process to properly exclude the specified directories or files.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • CLI
AWX VERSION
24.6.2.dev205+g3d7a4b20f1
ADDITIONAL INFORMATION

@Sasa993 Sasa993 changed the title Fix incorrectly passed keywords with exclude-strings arg to ansible-r… Fix incorrectly passed keywords with exclude-strings arg Dec 20, 2024
@AlanCoding
Copy link
Member

AlanCoding commented Dec 20, 2024

Thanks for figuring this out. Offering some quick confirmation of the basic idea here:

$ mkdir /tmp/awx_1234; mkdir /tmp/awx_4321; mkdir /tmp/awx_2345
$ ansible-runner worker cleanup --file-pattern=/tmp/awx_* --exclude-strings="1234, 2345" --grace-period=0
Removed 3 private data dir(s) in pattern /tmp/awx_*
(changed: True)

Removing all 3 directories is unintended here, because they were passed in the --exclude. Your fix:

$ mkdir /tmp/awx_1234; mkdir /tmp/awx_4321; mkdir /tmp/awx_2345
$ ansible-runner worker cleanup --file-pattern=/tmp/awx_* --exclude-strings 1234 2345 --grace-period=0
Removed 1 private data dir(s) in pattern /tmp/awx_*
(changed: True)

Out of the 3 folders, this commands excludes 2 of them, which is intended. It should still delete the folder not in the --exclude which is the 4321 folder. That works as intended.

My only very minor comment / request-for-change would be from looking at your test output:

AssertionError: assert 'cleanup --re...utable=podman' == 'cleanup --re...utable=podman'
  
  - cleanup --remove-images="quay.invalid/foo/bar:latest quay.invalid/foo/bar:devel" --image-prune --process-isolation-executable=podman
  ?                        ^^                                                      -
  + cleanup --remove-images quay.invalid/foo/bar:latest quay.invalid/foo/bar:devel --image-prune --process-isolation-executable=podman
  ?    

IMO, a better standard pattern would be to retain the quotes, but for each arg. Getting rid of the = also seems necessarily. This won't make any sense until I form the command here:

$ mkdir /tmp/awx_1234; mkdir /tmp/awx_4321; mkdir /tmp/awx_2345
$ ansible-runner worker cleanup --file-pattern "/tmp/awx_*" --exclude-strings "1234" "2345" --grace-period=0
Removed 1 private data dir(s) in pattern /tmp/awx_*
(changed: True)

This still gives us the desired behavior of deleting the 4321 directory. But for all arguments, we ditch the = and add ", the important nuance is that for multil-args, we put " around each positional entry. And actually, the * won't parse correctly if you don't have quotes, so I think keeping quotes is still best as a universal policy.

Consequence of bug

The cleanup job needs to use the --exclude to prevent the private_data_dir of actively running jobs from being deleted while that job is running. With >1 job running on a given node, this would seem to break the exclusion, while still allowing the cleanup to run, meaning a private_data_dir for a running job may (incorrectly) get deleted, obviously disrupting the job.

As you mention "JSON error", I want to talk this out that we're talking about errors in jobs that just happen to be running at the same time the cleanup command runs, because of disruptions from their private_data_dir being deleted. That is generally what I would expect from this.

@AlanCoding
Copy link
Member

I am looking to contribute test content for this fix in #15722

I did test it with your patch and it fixes it 🎉

You're fine to fix up your very small diff here and get it merged, and I can merge the tests after a rebase.

Copy link
Member

@AlanCoding AlanCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes needed to merge this:

  1. Modify code to quote items in option individually
  2. Update test_image_cleanup_scenario test to remove = from the expectation
  3. From the PR template, edit your PR title to identify it as a bug fix (I can do this)

@Sasa993 Sasa993 changed the title Fix incorrectly passed keywords with exclude-strings arg Bugfix: adjust incorrectly passed keywords with exclude-strings argument Dec 21, 2024
@Sasa993
Copy link
Contributor Author

Sasa993 commented Dec 21, 2024

Changes needed to merge this:

1. Modify code to quote items in `option` individually

2. Update `test_image_cleanup_scenario` test to remove `=` from the expectation

3. From the PR template, edit your PR title to identify it as a bug fix (I can do this)

Thank you for your quick review.

You're right, we need to keep those quotes - sorry for overlooking that.
I have made the changes you requested, please review again.

@Sasa993 Sasa993 requested a review from AlanCoding December 21, 2024 09:15
@Sasa993 Sasa993 force-pushed the adjust_args_in_exclude_strings_for_cleanup branch from acdea01 to aca056c Compare December 21, 2024 09:15
@thedoubl3j
Copy link
Member

started CI, waiting on @AlanCoding to clear his change requested. might be a bit for this one to land @Sasa993 since most of us are going into the US holiday season/shutdown but great stuff.

Copy link
Member

@AlanCoding AlanCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge order comments:

@AlanCoding AlanCoding merged commit 7835e39 into ansible:devel Jan 2, 2025
26 checks passed
@AlanCoding
Copy link
Member

I now believe this bug may have been introduced in #15037 @chrismeyersfsu @TheRealHaoLiu I've already marked you as reviewers, so just have a look at quote use there and here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants