-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make spawn an option again #15379
Make spawn an option again #15379
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least, I would like to see some comments surrounding the "why" for the EmbeddedAnsible worker.
I see the work around that you did here as a step back from what we are trying to accomplish here, and if it #hadToBeThisWay™, at the very least, some more explanation surrounding the why would be nice. Couldn't find anything meaningful in the git history or on github.
app/models/miq_worker.rb
Outdated
@@ -345,9 +345,32 @@ def start_runner | |||
Process.detach(pid) | |||
pid | |||
end | |||
alias_method :start_runner_via_fork, :start_runner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit that changes this and the code around is confusing and I don't understand why it is needed or how it works. Seems like we should change the caller instead of jumping through hoops in the interface to make this work, but I admit that I still am unsure what you are fixing/trying to accomplish in that commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I have figured out the "why" for this specific work around, but I still don't like what we are doing.
Instead of the embedded ansible
worker being a separate process like everything else, it is a Thread
off the main MiqServer::Monitor
process. That seems like it is all sorts of wrong, and doesn't translate to a containerized environment at all.
I can't find any rational as to why we (well... you) made this worker behave so differently, just that it was done (can't even find a PR for it, just a commit: 18ffca9 ).
Regardless of why the worker is like this, the code for this runner seems to be doing complete against a forking/spawning model, meaning it doesn't do either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. It's different. It will need to be changed with re-arch. The idea was that we didn't actually need to have a process to monitor another process so it was done in a thread. start_runner
was supposed to be for subclasses to implement in any way those wanted, spawn, fork, thread, etc.
Let me review this again in the morning to see if we can do this cleaner. I agree, it's messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'm not sure what happened to that commit, let me follow the bouncing ball...
- the worker was added in Embedded Ansible role worker #13551
- it was changed to not use a heartbeating thread during the 10 minute ansible setup in Rely on 10 minute starting_timeout instead of a heartbeating thread #14053
- the thread knowledge was moved out of the worker/runner and to the
start_runner
, where the fork logic lives in Move embedded ansible worker thread up to start_runner #14256 (the PR containing 18ffca9)
lib/workers/bin/run_single_worker.rb
Outdated
@@ -52,7 +56,12 @@ | |||
worker_class = worker_class.constantize | |||
worker_class.before_fork | |||
unless options[:dry_run] | |||
worker = worker_class.create_worker_record | |||
worker = if options[:guid] | |||
worker_class.find_by(:guid => options[:guid]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I wrote this, I can be critical of it ;)
Maybe we should do a find_by!
, or something similar so that we error out if we can't find the worker record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, you can be critical of other things too... Otherwise, what's the point? I know you're kidding but wanted to make sure you were "fine" with throwing 🍅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yes, I'll look at changing it
224c9ca
to
8ecf792
Compare
@@ -18,6 +19,7 @@ def start_runner | |||
end | |||
end | |||
end | |||
alias_method :start_runner_via_spawn, :start_runner_via_fork |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carbonin please review this part. It's kinda icky that this worker needs to know that it has to override both of these methods with a threaded runner but it seems like the simplest solution to get this feature on master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an excuse to not include @carbonin 's input, but my two cents for how this should be done is leave start_runner
as the method that is overwritten by this class, and slightly change up how it is implemented in the main MiqWorker
class:
def start_runner
+ ENV['MIQ_SPAWN_WORKERS'].to_s.downcase == "true" ? start_runner_via_spawn : start_runner_via_fork
+ end
+
+ def start_runner_via_fork
self.class.before_fork
pid = fork(:cow_friendly => true) do
self.class.after_fork
@@ -346,8 +346,25 @@ def start_runner
pid
end
+ def self.build_command_line(guid)
+ command_line = "#{Gem.ruby} #{runner_script} --heartbeat --guid=#{guid} #{self.name}"
+ ENV['APPLIANCE'] ? "nice #{nice_increment} #{command_line}" : command_line
+ end
+
+ def self.runner_script
+ script = ManageIQ.root.join("lib/workers/bin/run_single_worker.rb")
+ raise "script not found: #{script}" unless File.exist?(script)
+ script
+ end
+
+ def start_runner_via_spawn
+ self.pid = Kernel.spawn(self.class.build_command_line(guid), :out => "/dev/null", :err => [Rails.root.join("log", "evm.log"), "a"])
+ Process.detach(pid)
+ pid
+ end
+
def start
self.pid = start_runner
This way, any changes that are necessary to the normal workflow are done in the edgecase classes, and attempted to be accounted for in the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, I like that better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it too...minor tweak to it:
ENV['MIQ_SPAWN_WORKERS'] ? start_runner_via_spawn : start_runner_via_fork
You don't need to check for an explicit value in the ENV var, just its presence.
app/models/miq_worker.rb
Outdated
def start | ||
self.pid = start_runner | ||
self.pid = ENV['MIQ_SPAWN_WORKERS'].to_s.downcase == "true" ? start_runner_via_spawn : start_runner_via_fork |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see you had implemented it already, but you can drop the .to_s.downcase == "true"
bit...it's not necessary
8ecf792
to
a3e0587
Compare
Use run_single_worker script for worker spawning Embedded ansible worker overrides start_runner to implement a thread based worker implementation.
When running single workers in the context of a worker monitor, the worker records are usually already created by the monitor process by the time the worker is forked/spawned. Allowing the lib/worker/bin/run_single_worker.rb script to pass a guid allows it to use those records instead of creating new ones. If this option isn't used, the already created ones exist along side the other worker records, and cause evm:stop to not function properly.
a3e0587
to
3f3a823
Compare
Ok, fixed the cops and addressed all the things. Thanks for the suggestions @NickLaMuro and @Fryguy |
Checked commits jrafanie/manageiq@6d3362f~...3f3a823 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 |
This is nearly a no-op for the existing forking setup, but allows you to spawn workers instead via an environment variable.
We had to add some complexity for subclasses that implementstart_runner
because we don't want to bypass their implementation.EDIT: Embedded ansible worker was changed to implement both of the
start_runner fork/spawn methods to ensure the threaded implementation is
used.
MIQ_SPAWN_WORKERS=true bin/rake evm:start
=> Kernel.spawnMIQ_SPAWN_WORKERS=false bin/rake evm:start
=> forkbin/rake evm:start
=> forkNote, when using @NickLaMuro's singular_worker script, the second commit is needed to avoid having millions of worker rows.