Speed up with multi-threading? #174

danfinlay · 2014-06-25T05:25:21Z

Hi there, I've got a pretty large app I'm working on, and I'm getting pretty long build times (~7000ms at times). I was wondering if you thought it might be practical to use child_process.fork() or the cluster module to split the build process up among a computer's full processor.

This blog post is a pretty good introduction to the methods, I'm not adverse to trying to implement this myself, but I'd like some help grasping the broccoli architecture to help me do it.

Where, for example, is the individual file processing assigned?

The text was updated successfully, but these errors were encountered:

stefanpenner · 2014-06-25T11:08:55Z

Unfortunately, only workloads that are actually CPU intensive would be helped. Currently the vast majority of time (unless you are using ruby-sass or similar) is in pure IO. This is due to a data-loss quickfix for osx, and is on the roadmap to be solved.

Additionally, as of right now there is no way (without introducing some native extention) to utilize multiple user-land threads in a single node process. I suspect you use the word multithreading as a mistake, but instead meant multi-process

danfinlay · 2014-06-25T15:46:18Z

What about using a source map to only edit portions of the output that are updated? (server-mode only)

stefanpenner · 2014-06-25T15:57:59Z

I believe your idea isn't bad, but it ultimately wouldn't help enough right now. Or would require a different approach then the current architecture. The biggest problem IO sensitivity, has a known solution: see:

6b0a9d3

Honestly this accounts for more the 90% of all time spent in broccoli, once restored broccoli will again be incredibly fast.

If you have energy to help with that effort, it would yield fantastic returns.

danfinlay · 2014-06-25T16:01:51Z

Do you have a link to a description of the data loss quickfix for osx? I'm not familiar with the problem.

Also, if it's being worked on, I can't wait! Is that branch already somewhat working? Worth trying out in development?

stefanpenner · 2014-06-25T16:04:03Z

@FlySwatter I believe noone has time currently, but @rjackson and @joliss hope to work on it when they have time. I am sure they would be happy if someone can drive it to completion soon.

Additionally, once that is resolved, I suspect your class of optimization will be once against important.

stefanpenner · 2014-06-25T16:04:42Z

I would also like to push broccoli to a background process, I would love if broccoli did this automatically.

danfinlay · 2014-06-25T16:07:03Z

For the sake of others who might be willing to pitch in, could you explain / link to the problem in more detail?

chnn · 2014-06-25T20:10:35Z

I'm also curious to learn more about this and would like to help out if at all possible.

stefanpenner · 2014-06-25T20:39:22Z

quick GH search reveals: #88

chnn · 2014-06-25T20:43:51Z

thanks

joliss · 2014-06-26T11:29:18Z

Do you have a link to a description of the data loss quickfix for osx? I'm not familiar with the problem.

We had to stop using hardlinks; it's described here: https://github.com/broccolijs/broccoli/blob/master/docs/hardlink-issue.md

We're most likely going to switch to using symlinks, as the commit Stef linked alludes to, Coming soon.

joliss · 2014-06-26T11:30:39Z

Re the original question, I wrote down some thoughts on the problems with parallelizing stuff http://www.solitr.com/blog/2014/02/broccoli-first-release/, section "No Parallelism".

joliss · 2014-06-26T11:36:16Z

I'm getting pretty long build times (~7000ms at times)

For what it's worth, this seems generally acceptable for an initial build, but obviously unacceptable for incremental rebuilds. (I'm not clear on which one it is for you.)

We clearly want to keep working on performance, this is super important. Parallelizing seems like a workaround (with limited gains) - I really want to solve all the fundamental problems instead.

rwjblue · 2014-06-26T11:38:17Z

FYI - I spoke with @FlySwatter in IRC and was able to reduce rebuild times from 7s to 1s by removing unused folders from vendor/.

rwjblue · 2014-06-26T12:08:47Z

Problem

Currently, for mergeTrees we are copying all files for every input tree into the destination directory. This breaks down fairly quickly with either a large number of files or a small number of very large files.

@joliss and I (with help from many others) have been working on a good solution to this fundamental problem.

Large Files

I am actively working to move from copying to symlinking which by itself will remove the large file speed impact (since making a symlink is constant time regardless of source file size), but does not directly address the slowdown for large numbers of files (although it would likely be a bit better).

Large Numbers of Files

The way we plan to handle the large file count issue is by symlinking the root directories directly if the directory does not already exist in the destination directory. This will reduce the need to stat all files in each input tree, and allow symlinking a much smaller number of directories since the vast majority of merges are not using the same directories (especially true for vendor/ in Ember CLI).

Timeline

This is a hard one, but I'm hoping to have the rough changes made and ready for detailed review by @joliss early next week.

Plugin Impact

Changing mergeTrees to use symlinks has fairly large implications for other plugins. We have to change node-walk-sync to recurse into symlinks. Thankfully, the majority of plugins that inherit from broccoli-filter can simply update their version dependency, but many other plugins that may be doing their own directory traversal will have to ensure that they also traverse symlinks.

There are likely other considerations (as far as symlinking impact), feel free to discuss other impact here.

kumavis · 2014-06-30T22:45:48Z

if unwatched vendor trees were causing such long build times, this PR may deserve more attention

stefanpenner · 2014-06-30T23:03:30Z

more updates: broccolijs/broccoli-merge-trees#11

lots of great perf!

stefanpenner · 2014-12-19T17:02:19Z

I suspect this can be closed for now. In most apps the bottleneck is in slow filters themselves, and rarely an issue on incremental builds.

Some future work can be done to maybe "fast-boot" in perfect conditions, but this seems like a lower priority them improve some slow filters like esnext.

stefanpenner · 2014-12-19T17:55:41Z

I suspect once those concepts are thoroughly exhausted multi-process might need to be investigated.

insidewhy · 2015-05-26T19:11:26Z

@FlySwatter sighjs is a build system much like broccoli that can also delegate tasks to multiple CPUs.

danfinlay · 2015-05-27T03:41:14Z

Nice to know!

rwjblue closed this as completed Dec 19, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up with multi-threading? #174

Speed up with multi-threading? #174

danfinlay commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

danfinlay commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

danfinlay commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

danfinlay commented Jun 25, 2014

chnn commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

chnn commented Jun 25, 2014

joliss commented Jun 26, 2014

joliss commented Jun 26, 2014

joliss commented Jun 26, 2014

rwjblue commented Jun 26, 2014

rwjblue commented Jun 26, 2014

kumavis commented Jun 30, 2014

stefanpenner commented Jun 30, 2014

stefanpenner commented Dec 19, 2014

stefanpenner commented Dec 19, 2014

insidewhy commented May 26, 2015

danfinlay commented May 27, 2015

Speed up with multi-threading? #174

Speed up with multi-threading? #174

Comments

danfinlay commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

danfinlay commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

danfinlay commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

danfinlay commented Jun 25, 2014

chnn commented Jun 25, 2014

stefanpenner commented Jun 25, 2014

chnn commented Jun 25, 2014

joliss commented Jun 26, 2014

joliss commented Jun 26, 2014

joliss commented Jun 26, 2014

rwjblue commented Jun 26, 2014

rwjblue commented Jun 26, 2014

Problem

Large Files

Large Numbers of Files

Timeline

Plugin Impact

kumavis commented Jun 30, 2014

stefanpenner commented Jun 30, 2014

stefanpenner commented Dec 19, 2014

stefanpenner commented Dec 19, 2014

insidewhy commented May 26, 2015

danfinlay commented May 27, 2015