Fork workers no more 🔥 🔥 #16130

jrafanie · 2017-10-05T18:41:29Z

Remove conditional allowing spawn instead of fork and default to fork.
Remove various code supporting fork.

Background on WHY:

Fork, while creating processes much faster for us, relies on programs
being copy-on-write(CoW) friendly to take advantage of shared memory.

To do this, your program needs to preload as much of it's code ahead of
time and avoid writing to memory locations on OS pages containing objects
shared with other processes.

Ruby's garbage collector is naive in how it allocates objects on the
ruby heap pages as both old and new objects can be located on the same
OS page. Because most objects die young, it's wise to keep old objects on
separate OS pages from new objects. If a parent or child process
allocates or frees memory on the OS page containing shared memory objects,
the whole OS page is copied, including the shared objects. This happens
frequently if your heap is fragmented with old and new objects.

In addition, ruby, as of 2.2, has the notion of young and old objects
and the garbage collector can be more efficient by only traversing young
objects on a minor GC since young objects generally die young.
Unfortunately, the age field is used to track this and is directly on the
object header. This means that after surviving 3 GCs, an object is 'touched'
to mark it's age as 'old'. This causes any shared memory on the same
OS page to be copied with this object. This was mitigated by
https://github.com/ko1/nakayoshi_fork, but only for objects created before
you fork. Any new objects created after fork could be allocated on an OS
page coresident with shared objects, causing the whole OS page to be copied.

Ultimately, the shared memory for ManageIQ processes was often less than
15% within minutes of the processes starting and this number continued
to drop as more and more shared memory locations were copied on a
"neighboring" write.

This means that fork was only buying us faster process creation.

As we move towards running ManageIQ in containers, we also need to move
towards running processes in isolation via command lines.

We have begun separating out dependencies and selectively loading them
via bundler groups, which drops memory usage for non-fork processes but
makes fork less efficient since less and less code is preloaded and
shared.

Finally, fork is not implemented in every platform, such as Windows, or even
ruby, jRuby for example, so it was preventing usage of ManageIQ in those
ecosystems.

With all this said, it's time to say goodbye to fork.

chrisarcand · 2017-10-05T18:49:27Z

Awesome!

Fryguy · 2017-10-05T18:55:02Z

app/models/miq_worker.rb

-    else
-      start_runner_via_fork
-    end
-  end


Does this mess up @carbonin 's containers fork?

Yeah, but don't worry about that. The rebase is going to be a disaster either way 😆

womp womp

Fryguy · 2017-10-05T18:55:53Z

Gemfile

@@ -41,7 +41,6 @@ gem "manageiq-api-client",            "~>0.1.0",       :require => false
 gem "manageiq-network_discovery",     "~>0.1.2",       :require => false
 gem "mime-types",                     "~>2.6.1",       :path => "mime-types-redirector"
 gem "more_core_extensions",           "~>3.3"
-gem "nakayoshi_fork",                 "~>0.0.3"  # provides a more CoW friendly fork (GC a few times before fork)


Is this still useful for other uses of fork, such as the parallel gem?

I doubt it. The only place that cared about trying to eek out more shared memory was worker creation. We can keep it but this line was not added for users of parallel .

Fryguy · 2017-10-05T18:59:09Z

lib/workers/bin/run_single_worker.rb

@@ -55,7 +55,7 @@
 require File.expand_path("../../../config/environment", __dir__)

 worker_class = worker_class.constantize
-worker_class.before_fork
+preload_for_worker_role if respond_to?(:preload_for_worker_role)


what is respond_to being called on here. Should it be worker_class.respond_to??

Or rather

worker_class.preload_for_worker_role if worker_class.respond_to?(:preload_for_worker_role)

NickLaMuro · 2017-10-05T19:17:03Z

5 ✨. Would read again.

Remove conditional allowing spawn instead of fork and default to fork. Remove various code supporting fork. Background on WHY: Fork, while creating processes much faster for us, relies on programs being copy-on-write(CoW) friendly to take advantage of shared memory. To do this, your program needs to preload as much of it's code ahead of time and avoid writing to memory locations on OS pages containing objects shared with other processes. Ruby's garbage collector is naive in how it allocates objects on the ruby heap pages as both old and new objects can be located on the same OS page. Because most objects die young, it's wise to keep old objects on separate OS pages from new objects. If a parent or child process allocates or frees memory on the OS page containing shared memory objects, the whole OS page is copied, including the shared objects. This happens frequently if your heap is fragmented with old and new objects. In addition, ruby, as of 2.2, has the notion of young and old objects and the garbage collector can be more efficient by only traversing young objects on a minor GC since young objects generally die young. Unfortunately, the age field is used to track this and is directly on the object header. This means that after surviving 3 GCs, an object is 'touched' to mark it's age as 'old'. This causes any shared memory on the same OS page to be copied with this object. This was mitigated by https://github.com/ko1/nakayoshi_fork, but only for objects created before you fork. Any new objects created after fork could be allocated on an OS page coresident with shared objects, causing the whole OS page to be copied. Ultimately, the shared memory for ManageIQ processes was often less than 15% within minutes of the processes starting and this number continued to drop as more and more shared memory locations were copied on a "neighboring" write. This means that fork was only buying us faster process creation. As we move towards running ManageIQ in containers, we also need to move towards running processes in isolation via command lines. We have begun separating out dependencies and selectively loading them via bundler groups, which drops memory usage for non-fork processes but makes fork less efficient since less and less code is preloaded and shared. Finally, fork is not implemented in every platform, such as Windows, or even ruby, jRuby for example, so it was preventing usage of ManageIQ in those ecosystems. With all this said, it's time to say goodbye to fork.

jrafanie · 2017-10-05T19:47:41Z

lib/workers/bin/run_single_worker.rb

@@ -55,7 +55,7 @@
 require File.expand_path("../../../config/environment", __dir__)

 worker_class = worker_class.constantize
-worker_class.before_fork
+worker_class.preload_for_worker_role if worker_class.respond_to?(:preload_for_worker_role)


fixed, thanks @Fryguy

This might soon not be needed since I think the only thing that actually makes use of this is the UIWorker, and that is for the UIConstants, if I am not mistaken. Might be something else that I am missing though.

Scratch that, guess we configure the secret token in that line...

https://github.com/ManageIQ/manageiq/blob/874cad6/app/models/mixins/miq_web_server_worker_mixin.rb#L25-L27

Nevermind... 😞

Sorry... grepping and responding as I find new stuff... sorry for the spam, but found something else:

https://github.com/ManageIQ/manageiq/blob/874cad6/lib/vmdb/initializer.rb#L13

Does that mean we do this twice anyway?

(Also, for what it is worth, this code isn't hidden somewhere else in our other repos it seems: https://github.com/search?q=org%3AManageIQ+preload_for_worker_role&type=Code )

Does that mean we do this twice anyway?

@NickLaMuro Good catch. We gate that if you're a Rails::Server. We can probably better consolidate these in followup PRs

We can probably better consolidate these in followup PRs

Yup, for sure, and wasn't trying to suggest we do that in this PR. The before_fork stuff was just something I noticed when I original was getting the run_single_worker script going, so thought I would make a mention of it since you are working on it here.

miq-bot · 2017-10-05T19:58:33Z

Checked commit jrafanie@c602d0c with ruby 2.3.3, rubocop 0.47.1, and haml-lint 0.20.0
3 files checked, 0 offenses detected
Everything looks fine. 🍰

Fixes a bug introduced by ManageIQ#16130

…no_more" Reverting for now since worker_options for spawn were not properly being passed to the run_single_worker script, specifically the ems_id. This reverts commit f885db2, reversing changes made to b7a986d.

Revert "Merge pull request #16130 from jrafanie/fork_workers_no_more"

Fixes a bug first discovered after ManageIQ#16130 was merged. That PR was subsequently reverted. We'll remove fork support once spawn has complete parity without bugs.

Make systemd optional on systemd enabled systems Api/web service worker needs ui-classic as it can try to read an existing UI session, which can contain serialized classes from ui-classic. Previously, we tried removing fork here: ManageIQ#16130 It was reverted here: ManageIQ#16154 Some of the followups needed to fix the original problems including passing down ems_id to per ems workers, resolved in: ManageIQ#16199 and ManageIQ#18648 At this point, things should just work.

jrafanie requested review from Fryguy and gtanzillo October 5, 2017 18:41

jrafanie added core/workers enhancement labels Oct 5, 2017

miq-bot added the dependencies label Oct 5, 2017

Fryguy reviewed Oct 5, 2017

View reviewed changes

chessbyte assigned Fryguy Oct 5, 2017

jrafanie force-pushed the fork_workers_no_more branch from 5157bbb to c602d0c Compare October 5, 2017 19:44

jrafanie commented Oct 5, 2017

View reviewed changes

Fryguy merged commit f885db2 into ManageIQ:master Oct 6, 2017

Fryguy added this to the Sprint 71 Ending Oct 16, 2017 milestone Oct 6, 2017

jrafanie added a commit to jrafanie/manageiq that referenced this pull request Oct 6, 2017

Pass worker_options as env vars to the runner

aaafb4d

Fixes a bug introduced by ManageIQ#16130

jrafanie mentioned this pull request Oct 6, 2017

[WIP] Do not discard worker options - fixes fork removal PR #16140

Closed

jrafanie deleted the fork_workers_no_more branch October 6, 2017 20:48

carbonin added a commit that referenced this pull request Oct 9, 2017

Merge pull request #16154 from jrafanie/fork_is_back_for_now

9bd5f91

Revert "Merge pull request #16130 from jrafanie/fork_workers_no_more"

jrafanie mentioned this pull request Oct 13, 2017

Make spawn pass worker options again #16199

Merged

jrafanie mentioned this pull request Nov 25, 2019

Drop support for forking workers, use spawn by default #19556

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fork workers no more 🔥 🔥 #16130

Fork workers no more 🔥 🔥 #16130

jrafanie commented Oct 5, 2017

chrisarcand commented Oct 5, 2017

Fryguy Oct 5, 2017

carbonin Oct 5, 2017

NickLaMuro Oct 5, 2017

Fryguy Oct 5, 2017

jrafanie Oct 5, 2017

Fryguy Oct 5, 2017 •

edited

Loading

Fryguy Oct 5, 2017

NickLaMuro commented Oct 5, 2017

jrafanie Oct 5, 2017

NickLaMuro Oct 5, 2017

NickLaMuro Oct 5, 2017

NickLaMuro Oct 5, 2017 •

edited

Loading

jrafanie Oct 5, 2017

NickLaMuro Oct 5, 2017

miq-bot commented Oct 5, 2017

Fork workers no more 🔥 🔥 #16130

Fork workers no more 🔥 🔥 #16130

Conversation

jrafanie commented Oct 5, 2017

chrisarcand commented Oct 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy Oct 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NickLaMuro commented Oct 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NickLaMuro Oct 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miq-bot commented Oct 5, 2017

Fryguy Oct 5, 2017 •

edited

Loading

NickLaMuro Oct 5, 2017 •

edited

Loading