Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fork workers no more 🔥 🔥 #16130

Merged
merged 1 commit into from
Oct 6, 2017
Merged

Conversation

jrafanie
Copy link
Member

@jrafanie jrafanie commented Oct 5, 2017

  • Remove conditional allowing spawn instead of fork and default to fork.
  • Remove various code supporting fork.

Background on WHY:

Fork, while creating processes much faster for us, relies on programs
being copy-on-write(CoW) friendly to take advantage of shared memory.

To do this, your program needs to preload as much of it's code ahead of
time and avoid writing to memory locations on OS pages containing objects
shared with other processes.

Ruby's garbage collector is naive in how it allocates objects on the
ruby heap pages as both old and new objects can be located on the same
OS page. Because most objects die young, it's wise to keep old objects on
separate OS pages from new objects. If a parent or child process
allocates or frees memory on the OS page containing shared memory objects,
the whole OS page is copied, including the shared objects. This happens
frequently if your heap is fragmented with old and new objects.

In addition, ruby, as of 2.2, has the notion of young and old objects
and the garbage collector can be more efficient by only traversing young
objects on a minor GC since young objects generally die young.
Unfortunately, the age field is used to track this and is directly on the
object header. This means that after surviving 3 GCs, an object is 'touched'
to mark it's age as 'old'. This causes any shared memory on the same
OS page to be copied with this object. This was mitigated by
https://github.com/ko1/nakayoshi_fork, but only for objects created before
you fork. Any new objects created after fork could be allocated on an OS
page coresident with shared objects, causing the whole OS page to be copied.

Ultimately, the shared memory for ManageIQ processes was often less than
15% within minutes of the processes starting and this number continued
to drop as more and more shared memory locations were copied on a
"neighboring" write.

This means that fork was only buying us faster process creation.

As we move towards running ManageIQ in containers, we also need to move
towards running processes in isolation via command lines.

We have begun separating out dependencies and selectively loading them
via bundler groups, which drops memory usage for non-fork processes but
makes fork less efficient since less and less code is preloaded and
shared.

Finally, fork is not implemented in every platform, such as Windows, or even
ruby, jRuby for example, so it was preventing usage of ManageIQ in those
ecosystems.

With all this said, it's time to say goodbye to fork.

@chrisarcand
Copy link
Member

Awesome!

else
start_runner_via_fork
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mess up @carbonin 's containers fork?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but don't worry about that. The rebase is going to be a disaster either way 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

womp womp

@@ -41,7 +41,6 @@ gem "manageiq-api-client", "~>0.1.0", :require => false
gem "manageiq-network_discovery", "~>0.1.2", :require => false
gem "mime-types", "~>2.6.1", :path => "mime-types-redirector"
gem "more_core_extensions", "~>3.3"
gem "nakayoshi_fork", "~>0.0.3" # provides a more CoW friendly fork (GC a few times before fork)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still useful for other uses of fork, such as the parallel gem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it. The only place that cared about trying to eek out more shared memory was worker creation. We can keep it but this line was not added for users of parallel .

@@ -55,7 +55,7 @@
require File.expand_path("../../../config/environment", __dir__)

worker_class = worker_class.constantize
worker_class.before_fork
preload_for_worker_role if respond_to?(:preload_for_worker_role)
Copy link
Member

@Fryguy Fryguy Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is respond_to being called on here. Should it be worker_class.respond_to??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather

worker_class.preload_for_worker_role if worker_class.respond_to?(:preload_for_worker_role)

@NickLaMuro
Copy link
Member

5 ✨. Would read again.

Remove conditional allowing spawn instead of fork and default to fork.
Remove various code supporting fork.

Background on WHY:

Fork, while creating processes much faster for us, relies on programs
being copy-on-write(CoW) friendly to take advantage of shared memory.

To do this, your program needs to preload as much of it's code ahead of
time and avoid writing to memory locations on OS pages containing objects
shared with other processes.

Ruby's garbage collector is naive in how it allocates objects on the
ruby heap pages as both old and new objects can be located on the same
OS page.  Because most objects die young, it's wise to keep old objects on
separate OS pages from new objects.  If a parent or child process
allocates or frees memory on the OS page containing shared memory objects,
the whole OS page is copied, including the shared objects.  This happens
frequently if your heap is fragmented with old and new objects.

In addition, ruby, as of 2.2, has the notion of young and old objects
and the garbage collector can be more efficient by only traversing young
objects on a minor GC since young objects generally die young.
Unfortunately, the age field is used to track this and is directly on the
object header.  This means that after surviving 3 GCs, an object is 'touched'
to mark it's age as 'old'.  This causes any shared memory on the same
OS page to be copied with this object. This was mitigated by
https://github.com/ko1/nakayoshi_fork, but only for objects created before
you fork.  Any new objects created after fork could be allocated on an OS
page coresident with shared objects, causing the whole OS page to be copied.

Ultimately, the shared memory for ManageIQ processes was often less than
15% within minutes of the processes starting and this number continued
to drop as more and more shared memory locations were copied on a
"neighboring" write.

This means that fork was only buying us faster process creation.

As we move towards running ManageIQ in containers, we also need to move
towards running processes in isolation via command lines.

We have begun separating out dependencies and selectively loading them
via bundler groups, which drops memory usage for non-fork processes but
makes fork less efficient since less and less code is preloaded and
shared.

Finally, fork is not implemented in every platform, such as Windows, or even
ruby, jRuby for example, so it was preventing usage of ManageIQ in those
ecosystems.

With all this said, it's time to say goodbye to fork.
@jrafanie jrafanie force-pushed the fork_workers_no_more branch from 5157bbb to c602d0c Compare October 5, 2017 19:44
@@ -55,7 +55,7 @@
require File.expand_path("../../../config/environment", __dir__)

worker_class = worker_class.constantize
worker_class.before_fork
worker_class.preload_for_worker_role if worker_class.respond_to?(:preload_for_worker_role)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks @Fryguy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might soon not be needed since I think the only thing that actually makes use of this is the UIWorker, and that is for the UIConstants, if I am not mistaken. Might be something else that I am missing though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that, guess we configure the secret token in that line...

https://github.com/ManageIQ/manageiq/blob/874cad6/app/models/mixins/miq_web_server_worker_mixin.rb#L25-L27

Nevermind... 😞

Copy link
Member

@NickLaMuro NickLaMuro Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry... grepping and responding as I find new stuff... sorry for the spam, but found something else:

https://github.com/ManageIQ/manageiq/blob/874cad6/lib/vmdb/initializer.rb#L13

Does that mean we do this twice anyway?

(Also, for what it is worth, this code isn't hidden somewhere else in our other repos it seems: https://github.com/search?q=org%3AManageIQ+preload_for_worker_role&type=Code )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we do this twice anyway?

@NickLaMuro Good catch. We gate that if you're a Rails::Server. We can probably better consolidate these in followup PRs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably better consolidate these in followup PRs

Yup, for sure, and wasn't trying to suggest we do that in this PR. The before_fork stuff was just something I noticed when I original was getting the run_single_worker script going, so thought I would make a mention of it since you are working on it here.

@miq-bot
Copy link
Member

miq-bot commented Oct 5, 2017

Checked commit jrafanie@c602d0c with ruby 2.3.3, rubocop 0.47.1, and haml-lint 0.20.0
3 files checked, 0 offenses detected
Everything looks fine. 🍰

@Fryguy Fryguy merged commit f885db2 into ManageIQ:master Oct 6, 2017
@Fryguy Fryguy added this to the Sprint 71 Ending Oct 16, 2017 milestone Oct 6, 2017
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Oct 6, 2017
@jrafanie jrafanie deleted the fork_workers_no_more branch October 6, 2017 20:48
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Oct 9, 2017
…no_more"

Reverting for now since worker_options for spawn were not properly being
passed to the run_single_worker script, specifically the ems_id.

This reverts commit f885db2, reversing
changes made to b7a986d.
carbonin added a commit that referenced this pull request Oct 9, 2017
Revert "Merge pull request #16130 from jrafanie/fork_workers_no_more"
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Oct 24, 2017
Fixes a bug first discovered after ManageIQ#16130 was merged.
That PR was subsequently reverted.  We'll remove fork support once spawn
has complete parity without bugs.
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Oct 25, 2017
Fixes a bug first discovered after ManageIQ#16130 was merged.
That PR was subsequently reverted.  We'll remove fork support once spawn
has complete parity without bugs.
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Nov 25, 2019
Make systemd optional on systemd enabled systems

Api/web service worker needs ui-classic as it can try to read an existing UI session, which
can contain serialized classes from ui-classic.

Previously, we tried removing fork here: ManageIQ#16130
It was reverted here: ManageIQ#16154

Some of the followups needed to fix the original problems including passing down
ems_id to per ems workers, resolved in:
ManageIQ#16199 and
ManageIQ#18648

At this point, things should just work.
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Nov 25, 2019
Make systemd optional on systemd enabled systems

Api/web service worker needs ui-classic as it can try to read an existing UI session, which
can contain serialized classes from ui-classic.

Previously, we tried removing fork here: ManageIQ#16130
It was reverted here: ManageIQ#16154

Some of the followups needed to fix the original problems including passing down
ems_id to per ems workers, resolved in:
ManageIQ#16199 and
ManageIQ#18648

At this point, things should just work.
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Nov 25, 2019
Make systemd optional on systemd enabled systems

Api/web service worker needs ui-classic as it can try to read an existing UI session, which
can contain serialized classes from ui-classic.

Previously, we tried removing fork here: ManageIQ#16130
It was reverted here: ManageIQ#16154

Some of the followups needed to fix the original problems including passing down
ems_id to per ems workers, resolved in:
ManageIQ#16199 and
ManageIQ#18648

At this point, things should just work.
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Nov 25, 2019
Make systemd optional on systemd enabled systems

Api/web service worker needs ui-classic as it can try to read an existing UI session, which
can contain serialized classes from ui-classic.

Previously, we tried removing fork here: ManageIQ#16130
It was reverted here: ManageIQ#16154

Some of the followups needed to fix the original problems including passing down
ems_id to per ems workers, resolved in:
ManageIQ#16199 and
ManageIQ#18648

At this point, things should just work.
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Nov 25, 2019
Make systemd optional on systemd enabled systems

Api/web service worker needs ui-classic as it can try to read an existing UI session, which
can contain serialized classes from ui-classic.

Previously, we tried removing fork here: ManageIQ#16130
It was reverted here: ManageIQ#16154

Some of the followups needed to fix the original problems including passing down
ems_id to per ems workers, resolved in:
ManageIQ#16199 and
ManageIQ#18648

At this point, things should just work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants