-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit periodic keepalive events from Worker
#1191
Conversation
* handy for some testing scenarios
* it already shouldn't be possible to splat a keepalive after the eof event, but double-safe is good
* could matter for some test scenarios
* clean up kwargs handling so we don't transmit a worker-only arg to an older runner that might choke on it
* keeps main `runner` happy, but we still don't actually send it
I put up nitzmahone#1 to suggest a test. You can take it or leave it regarding the patch, but this definitely needs an integration test like that. |
New test for keepalive setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't try running it - but the code makes sense. LGTM pending approval from someone that actually tested it.
* new CLI arg and envvar for `Worker` mode to optionally emit regular keepalive events; fixes issues with container runtimes that assume long-silent stdout == hung process --------- Co-authored-by: Alan Rominger <[email protected]> (cherry picked from commit fd9d67a)
* new CLI arg and envvar for `Worker` mode to optionally emit regular keepalive events; fixes issues with container runtimes that assume long-silent stdout == hung process --------- Co-authored-by: Alan Rominger <[email protected]> (cherry picked from commit fd9d67a)
I can install https://github.com/ansible/ansible-runner/releases/tag/2.3.2 with Ubuntu 18.04, which is Python 3.6, but with following runtime error message:
Seems like changes from https://github.com/ansible/ansible-runner/pull/1191/files#diff-81c3df3f520280c8548b6499f287446e289a81ecad30288e0c561d3aa95456c5R1 require Python >= 3.7, am I correct? |
fixes #1187
This approach is less "fancy" than some others, but also doesn't require yet another interposing output handle.
Open issues:
short-circuit handling: Ideally there'd be a short-circuit in the
Process
layer for this, or any other "malformed"/bogus event, but as-is,Process
will bomb out with a KeyError on any dictionary missinguuid
orcounter
. Ideally, these keepalives could be as simple as a newline, dot, or{}
, but unless/until we can be sure that an olderProcess
isn't going to barf on them, we have to have a larger payload to keep it happy.other consumers: In cases where the event actually gets persisted by
Process
, will anything else be upset by the empty0-0.json
event it writes as? (EDIT: consensus was to go for a "fail-fast" approach instead)unit tests: there are a number of potential output corruption races this attempts to prevent, but simulating them with unit tests to ensure they stay "prevented" would probably be a Good Thing. (EDIT: added crazy way-subsecond keepalive tests to try and flush out problems and races here)
tests: Don't want to gold-plate this until we're sure we like this approach. Also, the transmit/worker/process tests need some refactoring to be able to introduce delays so the keepalives have a chance to fire, and also to ensure that they don't fire when they're not supposed to.
This specifically is not trying to solve the generic "I want to do periodic random stuff in runner" issue- it seems like that would be better handled in a future asyncio layer where such things are trivial to do without extra threading overhead.