[WIP] [Ansible::Runner] wait on artifacts/ to exist #20667

NickLaMuro · 2020-10-07T16:38:19Z

Before returning a result, wait on the artifacts/ directory to be created before returning a response object.

This will ensure that calling .running? will return false properly when the ansible-runner has ended, and not before it has even had a chance to start.

Thanks for Jason for the find on this bug.

Note: This should only really be a problem in a docker/podified environment, where launching the process and waiting for a response happens within the same background job. This isn't something that can happen on an Appliance, since it will have an inherent delay when monitoring the result via a new background job.

TODO

Maybe raise an error if we get through the entire loop...

Links

An arguably worse alternative to Use inotify to wait for ansible-runner pid file creation #20666

Steps for Testing/QA

I was using a test script for this in docker to run this.

$ docker run --rm -it manageiq/manageiq:latest-jansa /bin/bash
[root@5f4c439ca069 vmdb]# cd /var/www/vmdb/miq
[root@5f4c439ca069 vmdb]# curl -O https://raw.githubusercontent.com/ansible/test-playbooks/master/sleep.yml
[root@5f4c439ca069 vmdb]# cat script.rb
require 'pathname'

class Rails
  def self.root
    Pathname.new("/var/www/miq/vmdb")
  end
end

class Vmdb
  module Logging
  end
end

require 'awesome_spawn'
require 'ansible/runner'
require 'ansible/content'
require 'ansible/runner/response'
require 'ansible/runner/response_async'
require 'tmpdir'
require 'active_support/all'

response = Ansible::Runner.run_async({}, {}, "/var/www/miq/vmdb/hello_world.yml")
puts response.base_dir
puts response.running?

200.times do
  puts response.running?
end
[root@5f4c439ca069 vmdb]# ruby -I lib script.rb
/tmp/ansible-runner20201007-120-2d8yu1
false
false
true
true
...
true
true
false
false
false
false

A good run will return true values first... then false false values.
A bad run (where it isn't working correctly) with return some false values, followed by some true values, and then false values.

The idea is that true equals the process is running. If we get false prior to receiving some true values, we can reliably expect that .running? is a good indicator that the ansible-runner process was ever running properly.

The example run above shows it not working properly, and a correct run would be when the first to false output values are omitted.

NickLaMuro · 2020-10-07T16:41:32Z

The only advantage to this PR I see is that it will be easier to backport, since there isn't a gem dependency. However, I think that #20666 probably makes sense for the long term.

Fryguy · 2020-10-07T17:29:14Z

Do we want to wait on the artifacts dir or the pid file? I think the latter is more accurate, though it's hard to tell.

NickLaMuro · 2020-10-07T17:44:30Z

@Fryguy maybe more accurate, but seems like also more ephemeral, so it could exist then not within the time period of sleep(0.1), so we might miss it.

Figured it would be better to catch something that would remain after the run instead of something that could go away based off an error.

Before returning a result, wait on the `artifacts/` directory to be created before returning a response object. This will ensure that calling `.running?` will return `false` properly when the `ansible-runner` has ended, and not before it has even had a chance to start. Thanks for Jason for the find on this bug.

miq-bot · 2020-10-07T18:30:28Z

Checked commit NickLaMuro@764d71d with ruby 2.6.3, rubocop 0.69.0, haml-lint 0.28.0, and yamllint
1 file checked, 0 offenses detected
Everything looks fine. 👍

Fryguy · 2020-10-07T19:38:05Z

We may want to use this one for backport...let's discuss in #20666

Fryguy · 2020-10-07T19:47:02Z

lib/ansible/runner.rb

+
+      def wait_on(dir)
+        100.times do
+          Dir.exist?(dir)


I think you wanted break if Dir.exist? ?

you are correct... how did I miss that...

Anyway, thanks, will change in the new PR.

Edit: This is what I get for having to re-write something I tested in-place as a POC... without the raise though, it ✌️ technically ✌️ had the same effect...

agrare · 2020-10-07T19:53:23Z

@Fryguy maybe more accurate, but seems like also more ephemeral, so it could exist then not within the time period of sleep(0.1), so we might miss it.

👍 As long as artifacts/ is guaranteed to be created after pid

NickLaMuro · 2020-10-07T19:56:36Z

👍 As long as artifacts/ is guaranteed to be created after pid

/me pulls up runner code base to look for proof...

agrare · 2020-10-07T20:21:15Z

The listen gem has them showing up in the same set of updates

/tmp/ansible-runner20201007-63398-188ezai/daemon.log
/tmp/ansible-runner20201007-63398-188ezai/pid, /tmp/ansible-runner20201007-63398-188ezai/artifacts/result/command

But I think inotify was reporting pid was first, let me double check

agrare · 2020-10-07T20:27:54Z

rb-inotify gives a better view of the events in order:

[:create]: daemon.log
[:create]: pid
[:modify]: pid
[:isdir, :create]: artifacts

so 👍 from me

NickLaMuro · 2020-10-07T20:37:16Z

@agrare okay...

tl; dr: Yes, it is created after...

Long answer...

So, to start, the pidfile is created here:

https://github.com/ansible/ansible-runner/blob/1194519d275bf99ad296b07e2bc8ce33e37a7c34/ansible_runner/__main__.py#L882

(and is created in the same fashion in 1.4.6 as well, as opposed to the devel link I shared above)

That library is the python-daemon library, which also makes use of the lockfile lib as well. There is a bunch of code that I could link to the library... but to save some braincells, basically the TimeoutPIDLockFile that is called is from lockfile, and that is used by the python-daemon library to generate the lock file when the context is entered.

On the ansible-runner side, the definition of the artifacts_dir variable is initialized here:

https://github.com/ansible/ansible-runner/blob/blob/stable/1.4.x/ansible_runner/__main__.py#L285-L289

The code that creates the artifacts/ directory is called from a top level in ansible_runner/runner.py in the run() method:

https://github.com/ansible/ansible-runner/blob/stable/1.4.x/ansible_runner/runner.py#L89-L99

When that, the runner daemon context, is eventually entered here:

https://github.com/ansible/ansible-runner/blob/stable/1.4.x/ansible_runner/__main__.py#L559

it will create the pidfile in the python-daemon lib first, lock it, and then after that it will call the code that actually creates the artifacts directory as part of the .run() of ansible-runner.

NickLaMuro requested review from Fryguy and gtanzillo as code owners October 7, 2020 16:38

miq-bot added the wip label Oct 7, 2020

NickLaMuro mentioned this pull request Oct 7, 2020

Use inotify to wait for ansible-runner pid file creation #20666

Merged

NickLaMuro force-pushed the ansible-runner-wait-on branch from 1cb36ed to 764d71d Compare October 7, 2020 18:30

NickLaMuro closed this Oct 7, 2020

Fryguy reviewed Oct 7, 2020

View reviewed changes

NickLaMuro mentioned this pull request Oct 8, 2020

[JANSA][Ansible::Runner] wait on artifacts/ to exist #20670

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [Ansible::Runner] wait on artifacts/ to exist #20667

[WIP] [Ansible::Runner] wait on artifacts/ to exist #20667

NickLaMuro commented Oct 7, 2020 •

edited

Loading

NickLaMuro commented Oct 7, 2020

Fryguy commented Oct 7, 2020

NickLaMuro commented Oct 7, 2020

miq-bot commented Oct 7, 2020

Fryguy commented Oct 7, 2020

Fryguy Oct 7, 2020

NickLaMuro Oct 7, 2020 •

edited

Loading

agrare commented Oct 7, 2020

NickLaMuro commented Oct 7, 2020

agrare commented Oct 7, 2020

agrare commented Oct 7, 2020

NickLaMuro commented Oct 7, 2020

[WIP] [Ansible::Runner] wait on artifacts/ to exist #20667

[WIP] [Ansible::Runner] wait on artifacts/ to exist #20667

Conversation

NickLaMuro commented Oct 7, 2020 • edited Loading

TODO

Links

Steps for Testing/QA

NickLaMuro commented Oct 7, 2020

Fryguy commented Oct 7, 2020

NickLaMuro commented Oct 7, 2020

miq-bot commented Oct 7, 2020

Fryguy commented Oct 7, 2020

Fryguy Oct 7, 2020

Choose a reason for hiding this comment

NickLaMuro Oct 7, 2020 • edited Loading

Choose a reason for hiding this comment

agrare commented Oct 7, 2020

NickLaMuro commented Oct 7, 2020

agrare commented Oct 7, 2020

agrare commented Oct 7, 2020

NickLaMuro commented Oct 7, 2020

Long answer...

NickLaMuro commented Oct 7, 2020 •

edited

Loading

NickLaMuro Oct 7, 2020 •

edited

Loading