Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use inotify to wait for ansible-runner pid file creation #20666

Merged
merged 5 commits into from
Oct 7, 2020

Conversation

agrare
Copy link
Member

@agrare agrare commented Oct 7, 2020

When using ansible-runner start a daemon is forked off and runs the playbook in the background allowing the caller to continue one. It then provides a method to check if the playbook is still running via the ansible-runner is-alive call.

This call checks for the existence of the private_data_dir/pid file and sends a kill(2) syscall to check that the process is alive. Once the daemon exits the pid file is deleted and ansible-runner is-alive returns 1

If ansible-runner is-alive is called before the daemon is forked then the pid file hasn't been created yet and is-alive returns that the process isn't running which, while technically true, doesn't help a caller who wants to wait until the daemon has started and exited.

Since ansible-runner is-alive checks for the pid file it isn't safe to issue the is-alive check until after this file has been created if the purpose of calling is-alive is to check if the daemon has exited.

This can be done very efficiently by using inotify/fs_notify to catch filesystem events using watches. We setup a filesystem watch on the /tmp/ansible-runner tempdir looking for the pid file and as soon as it is created break out and return to the caller.

This makes all subsequent ansible-runner is-alive calls safe since it is impossible for the call to be issued before the daemon has started running.

Cross Repo Tests: ManageIQ/manageiq-cross_repo-tests#187

@agrare agrare added the wip label Oct 7, 2020
@agrare agrare force-pushed the ansible_runner_inotify_pid_file branch from 57743dd to dc72b0f Compare October 7, 2020 16:04
@agrare agrare force-pushed the ansible_runner_inotify_pid_file branch 2 times, most recently from 4ec7161 to 89af2a2 Compare October 7, 2020 16:09
@agrare agrare force-pushed the ansible_runner_inotify_pid_file branch from 89af2a2 to 90fdb09 Compare October 7, 2020 17:29
@agrare agrare force-pushed the ansible_runner_inotify_pid_file branch from 90fdb09 to 60d2f2a Compare October 7, 2020 17:31
@Fryguy
Copy link
Member

Fryguy commented Oct 7, 2020

I'd like to verify this works on a Mac (or at least what the behavior of inotify is)

@Fryguy
Copy link
Member

Fryguy commented Oct 7, 2020

On a Mac:

Traceback (most recent call last):
	3: from foo.rb:29:in `<main>'
	2: from foo.rb:2:in `wait_for'
	1: from /Users/jfrey/.rubies/ruby-2.6.6/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require'
/Users/jfrey/.rubies/ruby-2.6.6/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- rb-inotify (LoadError)
	11: from foo.rb:29:in `<main>'
	10: from foo.rb:2:in `wait_for'
	 9: from /Users/jfrey/.rubies/ruby-2.6.6/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:34:in `require'
	 8: from /Users/jfrey/.rubies/ruby-2.6.6/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:130:in `rescue in require'
	 7: from /Users/jfrey/.rubies/ruby-2.6.6/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:130:in `require'
	 6: from /Users/jfrey/.gem/ruby/2.6.6/gems/rb-inotify-0.10.1/lib/rb-inotify.rb:2:in `<top (required)>'
	 5: from /Users/jfrey/.rubies/ruby-2.6.6/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:65:in `require'
	 4: from /Users/jfrey/.rubies/ruby-2.6.6/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb:65:in `require'
	 3: from /Users/jfrey/.gem/ruby/2.6.6/gems/rb-inotify-0.10.1/lib/rb-inotify/native.rb:3:in `<top (required)>'
	 2: from /Users/jfrey/.gem/ruby/2.6.6/gems/rb-inotify-0.10.1/lib/rb-inotify/native.rb:9:in `<module:INotify>'
	 1: from /Users/jfrey/.gem/ruby/2.6.6/gems/rb-inotify-0.10.1/lib/rb-inotify/native.rb:28:in `<module:Native>'
/Users/jfrey/.gem/ruby/2.6.6/gems/ffi-1.13.1/lib/ffi/library.rb:273:in `attach_function': Function 'inotify_init' not found in [libc.dylib] (FFI::NotFoundError)

@Fryguy
Copy link
Member

Fryguy commented Oct 7, 2020

The listen gem bridges rb-inotify (linux) and rb-fsevent (mac) (and technically windows and others as well)...we may want to just use the listen gem.

@NickLaMuro
Copy link
Member

@Fryguy well, my installation of ansible-runner is borked on my machine, but I can at least confirm it doesn't raise an error when trying to use the code from #20667

Not sure if you wanted to test it yourself.

@NickLaMuro
Copy link
Member

This doesn't look good:

https://travis-ci.com/github/ManageIQ/manageiq/jobs/396503769#L720-L723

Both previous builds failed with this error.

@agrare agrare force-pushed the ansible_runner_inotify_pid_file branch from 9270cdd to 456eb20 Compare October 7, 2020 18:29
lib/ansible/runner.rb Outdated Show resolved Hide resolved
@Fryguy
Copy link
Member

Fryguy commented Oct 7, 2020

@agrare Do we do any other ansible runner stuff in other repos? If so, can we cross-repo-test those?

@miq-bot
Copy link
Member

miq-bot commented Oct 7, 2020

Checked commits agrare/manageiq@60d2f2a~...3aa562a with ruby 2.6.3, rubocop 0.69.0, haml-lint 0.28.0, and yamllint
3 files checked, 0 offenses detected
Everything looks fine. ⭐

@agrare agrare changed the title [WIP] Use INotify to wait for ansible-runner pid file Use INotify to wait for ansible-runner pid file Oct 7, 2020
@agrare
Copy link
Member Author

agrare commented Oct 7, 2020

Works on Linux and Mac OSX and is passing specs, taking out of WIP and running cross-repo tests

@miq-bot miq-bot removed the wip label Oct 7, 2020
@agrare agrare changed the title Use INotify to wait for ansible-runner pid file Use inotify to wait for ansible-runner pid file Oct 7, 2020
@agrare agrare changed the title Use inotify to wait for ansible-runner pid file Use inotify to wait for ansible-runner pid file creation Oct 7, 2020
@agrare
Copy link
Member Author

agrare commented Oct 7, 2020

This doesn't look good:

https://travis-ci.com/github/ManageIQ/manageiq/jobs/396503769#L720-L723

Both previous builds failed with this error.

Yeah we have specs with do expect(AwesomeSpawn).to receive(etc... so had to stub out our filesystem wait since ansible-runner isn't actually being run

@agrare
Copy link
Member Author

agrare commented Oct 7, 2020

Okay cross-repo tests are green

@Fryguy Fryguy merged commit 83cf8b2 into ManageIQ:master Oct 7, 2020
@agrare agrare deleted the ansible_runner_inotify_pid_file branch October 7, 2020 19:36
@Fryguy
Copy link
Member

Fryguy commented Oct 7, 2020

We my want to prefer #20667 for backport purposes, but i'm ok with introducing a new gem on backport. @agrare @NickLaMuro Thoughts?

@NickLaMuro
Copy link
Member

My 2cents would be for using #20667 for the backport since there is no gem, and I think all of our tests were done without testing this in containers. I was able to test #20667 directly on the manageiq/latest-jansa image, and nothing that was introduced caused issues there, or in the specs. Zero threads required, just slightly less elegant and clunky, but arguably less places to introduce instabilities.

While I do agree this is probably the #Right™ solution long term, I think there is a lot added to the product to justify it being a backport.

@agrare
Copy link
Member Author

agrare commented Oct 7, 2020

Yeah I think #20667 is probably safer for backport, maybe updated to check the pid file and raise an exception on timeout like this is doing?

@Fryguy
Copy link
Member

Fryguy commented Oct 7, 2020

@NickLaMuro You'll probably have to make it a direct-to-jansa PR

@NickLaMuro
Copy link
Member

maybe updated to check the pid file...

Commented on this part specifically as to why I don't want to do this: #20667 (comment)

Adds another race condition possibility, and the worst case with this is we wait the full 10 seconds if the artifacts dir is never created.

You'll probably have to make it a direct-to-jansa PR

yup, I will reopen as a new PR.

@simaishi
Copy link
Contributor

simaishi commented Oct 8, 2020

Backported to jansa via #20670

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants