-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fork workers no more 🔥 🔥 #16130
Fork workers no more 🔥 🔥 #16130
Conversation
Awesome! |
else | ||
start_runner_via_fork | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mess up @carbonin 's containers fork?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but don't worry about that. The rebase is going to be a disaster either way 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
womp womp
@@ -41,7 +41,6 @@ gem "manageiq-api-client", "~>0.1.0", :require => false | |||
gem "manageiq-network_discovery", "~>0.1.2", :require => false | |||
gem "mime-types", "~>2.6.1", :path => "mime-types-redirector" | |||
gem "more_core_extensions", "~>3.3" | |||
gem "nakayoshi_fork", "~>0.0.3" # provides a more CoW friendly fork (GC a few times before fork) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still useful for other uses of fork, such as the parallel gem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it. The only place that cared about trying to eek out more shared memory was worker creation. We can keep it but this line was not added for users of parallel .
lib/workers/bin/run_single_worker.rb
Outdated
@@ -55,7 +55,7 @@ | |||
require File.expand_path("../../../config/environment", __dir__) | |||
|
|||
worker_class = worker_class.constantize | |||
worker_class.before_fork | |||
preload_for_worker_role if respond_to?(:preload_for_worker_role) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is respond_to being called on here. Should it be worker_class.respond_to?
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather
worker_class.preload_for_worker_role if worker_class.respond_to?(:preload_for_worker_role)
5 ✨. Would read again. |
Remove conditional allowing spawn instead of fork and default to fork. Remove various code supporting fork. Background on WHY: Fork, while creating processes much faster for us, relies on programs being copy-on-write(CoW) friendly to take advantage of shared memory. To do this, your program needs to preload as much of it's code ahead of time and avoid writing to memory locations on OS pages containing objects shared with other processes. Ruby's garbage collector is naive in how it allocates objects on the ruby heap pages as both old and new objects can be located on the same OS page. Because most objects die young, it's wise to keep old objects on separate OS pages from new objects. If a parent or child process allocates or frees memory on the OS page containing shared memory objects, the whole OS page is copied, including the shared objects. This happens frequently if your heap is fragmented with old and new objects. In addition, ruby, as of 2.2, has the notion of young and old objects and the garbage collector can be more efficient by only traversing young objects on a minor GC since young objects generally die young. Unfortunately, the age field is used to track this and is directly on the object header. This means that after surviving 3 GCs, an object is 'touched' to mark it's age as 'old'. This causes any shared memory on the same OS page to be copied with this object. This was mitigated by https://github.com/ko1/nakayoshi_fork, but only for objects created before you fork. Any new objects created after fork could be allocated on an OS page coresident with shared objects, causing the whole OS page to be copied. Ultimately, the shared memory for ManageIQ processes was often less than 15% within minutes of the processes starting and this number continued to drop as more and more shared memory locations were copied on a "neighboring" write. This means that fork was only buying us faster process creation. As we move towards running ManageIQ in containers, we also need to move towards running processes in isolation via command lines. We have begun separating out dependencies and selectively loading them via bundler groups, which drops memory usage for non-fork processes but makes fork less efficient since less and less code is preloaded and shared. Finally, fork is not implemented in every platform, such as Windows, or even ruby, jRuby for example, so it was preventing usage of ManageIQ in those ecosystems. With all this said, it's time to say goodbye to fork.
5157bbb
to
c602d0c
Compare
@@ -55,7 +55,7 @@ | |||
require File.expand_path("../../../config/environment", __dir__) | |||
|
|||
worker_class = worker_class.constantize | |||
worker_class.before_fork | |||
worker_class.preload_for_worker_role if worker_class.respond_to?(:preload_for_worker_role) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks @Fryguy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might soon not be needed since I think the only thing that actually makes use of this is the UIWorker
, and that is for the UIConstants
, if I am not mistaken. Might be something else that I am missing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that, guess we configure the secret token in that line...
Nevermind... 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry... grepping and responding as I find new stuff... sorry for the spam, but found something else:
https://github.com/ManageIQ/manageiq/blob/874cad6/lib/vmdb/initializer.rb#L13
Does that mean we do this twice anyway?
(Also, for what it is worth, this code isn't hidden somewhere else in our other repos it seems: https://github.com/search?q=org%3AManageIQ+preload_for_worker_role&type=Code )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean we do this twice anyway?
@NickLaMuro Good catch. We gate that if you're a Rails::Server
. We can probably better consolidate these in followup PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably better consolidate these in followup PRs
Yup, for sure, and wasn't trying to suggest we do that in this PR. The before_fork
stuff was just something I noticed when I original was getting the run_single_worker
script going, so thought I would make a mention of it since you are working on it here.
Checked commit jrafanie@c602d0c with ruby 2.3.3, rubocop 0.47.1, and haml-lint 0.20.0 |
Fixes a bug introduced by ManageIQ#16130
Revert "Merge pull request #16130 from jrafanie/fork_workers_no_more"
Fixes a bug first discovered after ManageIQ#16130 was merged. That PR was subsequently reverted. We'll remove fork support once spawn has complete parity without bugs.
Fixes a bug first discovered after ManageIQ#16130 was merged. That PR was subsequently reverted. We'll remove fork support once spawn has complete parity without bugs.
Make systemd optional on systemd enabled systems Api/web service worker needs ui-classic as it can try to read an existing UI session, which can contain serialized classes from ui-classic. Previously, we tried removing fork here: ManageIQ#16130 It was reverted here: ManageIQ#16154 Some of the followups needed to fix the original problems including passing down ems_id to per ems workers, resolved in: ManageIQ#16199 and ManageIQ#18648 At this point, things should just work.
Make systemd optional on systemd enabled systems Api/web service worker needs ui-classic as it can try to read an existing UI session, which can contain serialized classes from ui-classic. Previously, we tried removing fork here: ManageIQ#16130 It was reverted here: ManageIQ#16154 Some of the followups needed to fix the original problems including passing down ems_id to per ems workers, resolved in: ManageIQ#16199 and ManageIQ#18648 At this point, things should just work.
Make systemd optional on systemd enabled systems Api/web service worker needs ui-classic as it can try to read an existing UI session, which can contain serialized classes from ui-classic. Previously, we tried removing fork here: ManageIQ#16130 It was reverted here: ManageIQ#16154 Some of the followups needed to fix the original problems including passing down ems_id to per ems workers, resolved in: ManageIQ#16199 and ManageIQ#18648 At this point, things should just work.
Make systemd optional on systemd enabled systems Api/web service worker needs ui-classic as it can try to read an existing UI session, which can contain serialized classes from ui-classic. Previously, we tried removing fork here: ManageIQ#16130 It was reverted here: ManageIQ#16154 Some of the followups needed to fix the original problems including passing down ems_id to per ems workers, resolved in: ManageIQ#16199 and ManageIQ#18648 At this point, things should just work.
Make systemd optional on systemd enabled systems Api/web service worker needs ui-classic as it can try to read an existing UI session, which can contain serialized classes from ui-classic. Previously, we tried removing fork here: ManageIQ#16130 It was reverted here: ManageIQ#16154 Some of the followups needed to fix the original problems including passing down ems_id to per ems workers, resolved in: ManageIQ#16199 and ManageIQ#18648 At this point, things should just work.
Background on WHY:
Fork, while creating processes much faster for us, relies on programs
being copy-on-write(CoW) friendly to take advantage of shared memory.
To do this, your program needs to preload as much of it's code ahead of
time and avoid writing to memory locations on OS pages containing objects
shared with other processes.
Ruby's garbage collector is naive in how it allocates objects on the
ruby heap pages as both old and new objects can be located on the same
OS page. Because most objects die young, it's wise to keep old objects on
separate OS pages from new objects. If a parent or child process
allocates or frees memory on the OS page containing shared memory objects,
the whole OS page is copied, including the shared objects. This happens
frequently if your heap is fragmented with old and new objects.
In addition, ruby, as of 2.2, has the notion of young and old objects
and the garbage collector can be more efficient by only traversing young
objects on a minor GC since young objects generally die young.
Unfortunately, the age field is used to track this and is directly on the
object header. This means that after surviving 3 GCs, an object is 'touched'
to mark it's age as 'old'. This causes any shared memory on the same
OS page to be copied with this object. This was mitigated by
https://github.com/ko1/nakayoshi_fork, but only for objects created before
you fork. Any new objects created after fork could be allocated on an OS
page coresident with shared objects, causing the whole OS page to be copied.
Ultimately, the shared memory for ManageIQ processes was often less than
15% within minutes of the processes starting and this number continued
to drop as more and more shared memory locations were copied on a
"neighboring" write.
This means that fork was only buying us faster process creation.
As we move towards running ManageIQ in containers, we also need to move
towards running processes in isolation via command lines.
We have begun separating out dependencies and selectively loading them
via bundler groups, which drops memory usage for non-fork processes but
makes fork less efficient since less and less code is preloaded and
shared.
Finally, fork is not implemented in every platform, such as Windows, or even
ruby, jRuby for example, so it was preventing usage of ManageIQ in those
ecosystems.
With all this said, it's time to say goodbye to fork.