-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a state machine for long ansible operations #17759
Add a state machine for long ansible operations #17759
Conversation
4e51332
to
eb82037
Compare
if pid.nil? | ||
queue_signal(:error) | ||
else | ||
context[:ansible_runner_pid] = pid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I'm aware 😄 thanks though @mkanoor
fb094ac
to
c36784e
Compare
@gtanzillo @Ladas this is ready to review, I just need to update it to match @Ladas 's changes for running async playbooks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good... Still reviewing
9f331cc
to
9710fcd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 👍
@agrare |
@agrare Using MiqTask also allows these commands to be called remotely using the REST API. There is already a whole interface built around the MiqTask. |
@mkanoor A job automatically create an miq_task, https://github.com/ManageIQ/manageiq/blob/master/app/models/job.rb#L32
I don't see anything extra in that state machine around tasks besides setting the started_on time for the task, what is missing that would prevent it from being monitored for completion?
We aren't intending for this to be called directly from the REST API, rather by provider operations methods
I'm confused, are you suggesting that we use MiqTask instead of Job? |
@agrare So there is a Job as well as a MiqTask in the mix, the Job is the internal implementation and the MiqTask is the external view of it. @bzwei can you comment where you create the MiqTask, I see you update the attributes here manageiq/app/models/manageiq/providers/embedded_ansible/automation_manager/playbook_runner.rb Line 22 in 1893e95
If we can reuse some of the code between the 2 state machines it would help |
@mkanoor there is a task created for all jobs so I'm wondering what about that isn't enough?
The task is created here in the base Job class: https://github.com/ManageIQ/manageiq/blob/master/app/models/job.rb#L32 |
@agrare That sounds good can you update some of the properties on the message so we can monitor it, we might be getting something back from the runner, that might make sense for the user. I think we had to update the started_time and updated time. |
Most properties are already updated whenever the job is updated https://github.com/ManageIQ/manageiq/blob/master/app/models/job.rb#L44-L46
|
@agrare Is there a piece of code around the Job that Automate would be calling, is there a class similar to |
@mkanoor that's not really the purpose of this PR, this is intended to be used by providers in their ops methods. If we want to expose this to automate I'm okay with doing a follow-up PR but I think that is out of scope of this. |
This adds a Job state machine for long running async ansible operations.
367544f
to
5c306ae
Compare
@agrare @mkanoor Once this is ready we maybe able to share it to be used by automate to take advantage of Ansible native support for runner instead of forcing to create a job template. @agrare Does the class name have to be |
Nope it can be named anything we like :) I was keeping consistent with I'd rather not call it |
Taking out of WIP now that @Ladas's async changes are merged |
|
if started_on + options[:timeout] < Time.now.utc | ||
response.stop | ||
|
||
queue_signal(:abort, "Playbook has been running longer than timeout", "error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could log a warning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is already logged by process_abort
and looks like this:
ERROR -- : MIQ(ManageIQ::Providers::AnsibleOperationWorkflow#process_abort) job aborting, Playbook has been running longer than timeout
context[:ansible_runner_return_code] = result.return_code | ||
context[:ansible_runner_stdout] = result.parsed_stdout | ||
|
||
set_status("Playbook failed", "error") if result.return_code != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also log it? Or the thing is, the response object is the only one holding the result (we are deleting the data from the filesystem). So we need to show the failure somewhere.
I would probably start with log error of the whole response and we can see if I can extract, response.error_message and response.traceback/full_error in a generic way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right this wouldn't log anything, I'll log the full parsed_stdout if it fails for now. What I'd like to do is set the status to a useful error message if we can get at that in a meaningful way but this might have to be a post_playbook operation that has to be done specifically by the provider author not generically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 👍
Lets take the logs/output to another PR, we'll need to make sure we can debug what went wrong from log, up to the nice notification in the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome, thanks @agrare!
311a0ce
to
37acbdd
Compare
Checked commits agrare/manageiq@f70974f~...37acbdd with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0 |
This adds a Job-based state machine for long-running asynchronous ansible operations.
This state machine can be used directly by providers just by passing in the same arguments you would pass to
Ansible::Runner.run
but instead toManageIQ::Providers::AnsibleOperationWorkflow.create_job
. You can also control the timeout and the poll interval as options to.create_job
.If there are any pre-playbook setup steps or post-playbook cleanup steps there are hooks in the state machine that can be implemented by deriving from this class and implementing them there.