Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue ServiceAnsiblePlaybook#execute for the embedded_ansible role #19055

Merged
merged 2 commits into from
Jul 26, 2019

Conversation

carbonin
Copy link
Member

Before this change the actual playbook run could happen on any
appliance with the automate role. We need the playbook run to happen
specifically on the embedded_ansible appliance so that we can ensure
the repo update/clone process will run successfully.

cc @gmcculloug @tinaafitz I'm not sure if this can be done in any better way. I wasn't able to find a higher level place to change how this was getting queued. It seems like the actual call stack for execute ends somewhere in automate engine and drb so I feel like that would be harder to tackle than this.

But if someone knows that there's somewhere else I can make this change, then I'm open for suggestions.

Copy link
Member

@NickLaMuro NickLaMuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions/concerns, but my guess is they just need clarification for my stupid self and this probably is fine as is.

:role => "embedded_ansible"
}

task_id = MiqTask.generic_action_with_callback(task_opts, queue_opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is worth holding up the merge of this PR, but this looks very similar to what is available through EmbeddedAnsible::CrudCommon:

def queue(zone, instance_id, method_name, args, action, auth_user)
task_opts = {
:action => action,
:userid => auth_user || "system"
}
queue_opts = {
:args => args,
:priority => MiqQueue::HIGH_PRIORITY,
:class_name => name,
:method_name => method_name,
:role => "embedded_ansible",
:zone => zone
}
queue_opts[:instance_id] = instance_id if instance_id
MiqTask.generic_action_with_callback(task_opts, queue_opts)
end

So it would nice if there was a clean way to share this code without having to dig into the provider code to do it. But again, I think this is fine as a surgical fix and I think trying to DRY this up can be done at a later date.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I guess my question for this new implementation is what is now happening with this line (which exists in _execute now:

new_job = ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.create_job(jt, decrypt_options(opts))

Is that just going to cause the same issue that we are seeing now, just with another queue item in between it?

(Note: You can just expand the code in the diff below to see what I talking about, just shared a permalink since I can't comment on code that hasn't changed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't think it's really worth DRYing this particular bit up. I think this is kinda how the task/queue stuff is meant to be used. Plus I think there's a fair chance that the args here could change separately from the provider CRUD stuff.

I was thinking that there could be a use case for creating an MiqTask helper to do the wait_for_taskid and raise if the status is no-good, but we're not doing that in crud_common so ... shrug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that just going to cause the same issue that we are seeing now, just with another queue item in between it?

No, at this point in the thread of execution we will queue new items for states in the job, but those are all queued for the same server here:

def queue_signal(*args, deliver_on: nil)
role = options[:role] || "ems_operations"
priority = options[:priority] || MiqQueue::NORMAL_PRIORITY
MiqQueue.put(
:class_name => self.class.name,
:method_name => "signal",
:instance_id => id,
:priority => priority,
:role => role,
:zone => zone,
:task_id => guid,
:args => args,
:deliver_on => deliver_on,
:server_guid => MiqServer.my_server.guid,
)
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I see now. It is a bit farther down then the preview showed:

:server_guid => MiqServer.my_server.guid,

I was seeing the *args and was trying to find where it was being passed in from 😩

Okay, cool, looks good to me.

@NickLaMuro NickLaMuro self-requested a review July 24, 2019 20:47
Copy link
Member

@NickLaMuro NickLaMuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fat fingered the re-review request... not really knowing what it did.

Still good. 🐑 🇮🇹

}

task_id = MiqTask.generic_action_with_callback(task_opts, queue_opts)
task = MiqTask.wait_for_taskid(task_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should wait for task here, because that is effectively a blocking call (sleep loop, if I recall). I expected a new state that uses queue based waiting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation didn't have a queue call at all so the blocking implementation is the same as it was previously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put another way, there's a check_complete method, and we should probably check for the task completion there, else bail for the next check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage the job itself was queued wasnt it? That was my intention at least, and then the "externally running" job was supposed to be checked in check_complete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage the job itself was queued wasnt it?

That's what I was trying to figure out, but got lost in automate. Here's puts caller.join("\n") at the start of the old #execute method:

/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `public_send'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:333:in `block in object_send'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:352:in `ar_method'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:362:in `ar_method'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:331:in `object_send'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (3 levels) in expose'
/home/ncarboni/Source/manageiq/app/models/user.rb:283:in `with_user'
/home/ncarboni/.gem/ruby/2.5.1/bundler/gems/manageiq-automation_engine-09c8bbdf6762/lib/miq_automation_engine/engine/miq_ae_method_service/miq_ae_service_model_base.rb:182:in `block (2 levels) in expose'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1630:in `perform_without_block'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1590:in `perform'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1674:in `block (2 levels) in main_loop'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1670:in `loop'
/home/ncarboni/.rubies/ruby-2.5.1/lib/ruby/2.5.0/drb/drb.rb:1670:in `block in main_loop'

So up to this point the service was queued for execution, but for the automate role. I now need the job itself to run on the appliance with the embedded_ansible role.

ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job.create_job calls run on the passed job template which in turn creates the workflow and calls signal the first time, but the base signal implementation just calls the method in the same process and AnsibleRunnerWorkflow#queue_signal is set up to run the method on the same server here.

So to make a long story short, I'm pretty sure this is what we need to do 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my expectation of what create_job did was wrong, but I still contend that this should be async. In good with merge for now though and talking about that for follow-up since it's technically the same before/after as you've said

@@ -14,6 +14,25 @@ def preprocess(action, add_options = {})
end

def execute(action)
task_opts = {
:action => "Executing Ansible Playbook",
:userid => "system"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the current user who ordered the service or the service owner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? I wasn't sure if this was relevant as the user didn't seem to be used, but I can change it. How to I get the userid for the user who ordered the service (I would assume that's the better option of the two)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should actually stay as system, because it's just an internal placement mechanism, not something we need user context for. I think the "Executing Ansible Playbook" action name is what makes that confusing and we should change that to "Launching Ansible Job" or something.

carbonin added 2 commits July 26, 2019 10:53
Before this change the actual playbook run could happen on any
appliance with the automate role. We need the playbook run to happen
specifically on the embedded_ansible appliance so that we can ensure
the repo update/clone process will run successfully.
@carbonin carbonin force-pushed the queue_service_playbook_runs branch from b08b2f9 to 640a8a5 Compare July 26, 2019 14:54
@miq-bot
Copy link
Member

miq-bot commented Jul 26, 2019

Checked commits carbonin/manageiq@11d517d~...640a8a5 with ruby 2.4.6, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
2 files checked, 0 offenses detected
Everything looks fine. 🍪

@carbonin
Copy link
Member Author

Okay, @Fryguy I think I made the changes you wanted for this first pass. I'll work on a followup that makes this async by saving off the task id into the service options and using the check_completed method.

@Fryguy Fryguy merged commit d855e23 into ManageIQ:master Jul 26, 2019
@Fryguy Fryguy added this to the Sprint 117 Ending Aug 5, 2019 milestone Jul 26, 2019
simaishi pushed a commit that referenced this pull request Jul 26, 2019
Queue ServiceAnsiblePlaybook#execute for the embedded_ansible role

(cherry picked from commit d855e23)
@simaishi
Copy link
Contributor

Ivanchuk backport details:

$ git log -1
commit 1ffdecca73ac3e6f13f71644e41b23cc4b8ac017
Author: Jason Frey <[email protected]>
Date:   Fri Jul 26 17:32:07 2019 -0400

    Merge pull request #19055 from carbonin/queue_service_playbook_runs
    
    Queue ServiceAnsiblePlaybook#execute for the embedded_ansible role
    
    (cherry picked from commit d855e23f1a6f3f5891f7a01d2c285c050b2618b1)

@carbonin carbonin deleted the queue_service_playbook_runs branch August 16, 2019 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants