Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up with multi-threading? #174

Closed
danfinlay opened this issue Jun 25, 2014 · 21 comments
Closed

Speed up with multi-threading? #174

danfinlay opened this issue Jun 25, 2014 · 21 comments

Comments

@danfinlay
Copy link

Hi there, I've got a pretty large app I'm working on, and I'm getting pretty long build times (~7000ms at times). I was wondering if you thought it might be practical to use child_process.fork() or the cluster module to split the build process up among a computer's full processor.

This blog post is a pretty good introduction to the methods, I'm not adverse to trying to implement this myself, but I'd like some help grasping the broccoli architecture to help me do it.

Where, for example, is the individual file processing assigned?

@stefanpenner
Copy link
Contributor

Unfortunately, only workloads that are actually CPU intensive would be helped. Currently the vast majority of time (unless you are using ruby-sass or similar) is in pure IO. This is due to a data-loss quickfix for osx, and is on the roadmap to be solved.

Additionally, as of right now there is no way (without introducing some native extention) to utilize multiple user-land threads in a single node process. I suspect you use the word multithreading as a mistake, but instead meant multi-process

@danfinlay
Copy link
Author

What about using a source map to only edit portions of the output that are updated? (server-mode only)

@stefanpenner
Copy link
Contributor

I believe your idea isn't bad, but it ultimately wouldn't help enough right now. Or would require a different approach then the current architecture. The biggest problem IO sensitivity, has a known solution: see:

6b0a9d3

Honestly this accounts for more the 90% of all time spent in broccoli, once restored broccoli will again be incredibly fast.

If you have energy to help with that effort, it would yield fantastic returns.

@danfinlay
Copy link
Author

Do you have a link to a description of the data loss quickfix for osx? I'm not familiar with the problem.

Also, if it's being worked on, I can't wait! Is that branch already somewhat working? Worth trying out in development?

@stefanpenner
Copy link
Contributor

@FlySwatter I believe noone has time currently, but @rjackson and @joliss hope to work on it when they have time. I am sure they would be happy if someone can drive it to completion soon.

Additionally, once that is resolved, I suspect your class of optimization will be once against important.

@stefanpenner
Copy link
Contributor

I would also like to push broccoli to a background process, I would love if broccoli did this automatically.

@danfinlay
Copy link
Author

For the sake of others who might be willing to pitch in, could you explain / link to the problem in more detail?

@chnn
Copy link

chnn commented Jun 25, 2014

I'm also curious to learn more about this and would like to help out if at all possible.

@stefanpenner
Copy link
Contributor

quick GH search reveals: #88

@chnn
Copy link

chnn commented Jun 25, 2014

thanks

@joliss
Copy link
Member

joliss commented Jun 26, 2014

Do you have a link to a description of the data loss quickfix for osx? I'm not familiar with the problem.

We had to stop using hardlinks; it's described here: https://github.com/broccolijs/broccoli/blob/master/docs/hardlink-issue.md

We're most likely going to switch to using symlinks, as the commit Stef linked alludes to, Coming soon.

@joliss
Copy link
Member

joliss commented Jun 26, 2014

Re the original question, I wrote down some thoughts on the problems with parallelizing stuff http://www.solitr.com/blog/2014/02/broccoli-first-release/, section "No Parallelism".

@joliss
Copy link
Member

joliss commented Jun 26, 2014

I'm getting pretty long build times (~7000ms at times)

For what it's worth, this seems generally acceptable for an initial build, but obviously unacceptable for incremental rebuilds. (I'm not clear on which one it is for you.)

We clearly want to keep working on performance, this is super important. Parallelizing seems like a workaround (with limited gains) - I really want to solve all the fundamental problems instead.

@rwjblue
Copy link
Member

rwjblue commented Jun 26, 2014

FYI - I spoke with @FlySwatter in IRC and was able to reduce rebuild times from 7s to 1s by removing unused folders from vendor/.

@rwjblue
Copy link
Member

rwjblue commented Jun 26, 2014

Problem

Currently, for mergeTrees we are copying all files for every input tree into the destination directory. This breaks down fairly quickly with either a large number of files or a small number of very large files.

@joliss and I (with help from many others) have been working on a good solution to this fundamental problem.

Large Files

I am actively working to move from copying to symlinking which by itself will remove the large file speed impact (since making a symlink is constant time regardless of source file size), but does not directly address the slowdown for large numbers of files (although it would likely be a bit better).

Large Numbers of Files

The way we plan to handle the large file count issue is by symlinking the root directories directly if the directory does not already exist in the destination directory. This will reduce the need to stat all files in each input tree, and allow symlinking a much smaller number of directories since the vast majority of merges are not using the same directories (especially true for vendor/ in Ember CLI).

Timeline

This is a hard one, but I'm hoping to have the rough changes made and ready for detailed review by @joliss early next week.

Plugin Impact

Changing mergeTrees to use symlinks has fairly large implications for other plugins. We have to change node-walk-sync to recurse into symlinks. Thankfully, the majority of plugins that inherit from broccoli-filter can simply update their version dependency, but many other plugins that may be doing their own directory traversal will have to ensure that they also traverse symlinks.

There are likely other considerations (as far as symlinking impact), feel free to discuss other impact here.

@kumavis
Copy link

kumavis commented Jun 30, 2014

if unwatched vendor trees were causing such long build times, this PR may deserve more attention

@stefanpenner
Copy link
Contributor

more updates: broccolijs/broccoli-merge-trees#11

lots of great perf!

@stefanpenner
Copy link
Contributor

I suspect this can be closed for now. In most apps the bottleneck is in slow filters themselves, and rarely an issue on incremental builds.

Some future work can be done to maybe "fast-boot" in perfect conditions, but this seems like a lower priority them improve some slow filters like esnext.

@rwjblue rwjblue closed this as completed Dec 19, 2014
@stefanpenner
Copy link
Contributor

I suspect once those concepts are thoroughly exhausted multi-process might need to be investigated.

@insidewhy
Copy link

@FlySwatter sighjs is a build system much like broccoli that can also delegate tasks to multiple CPUs.

@danfinlay
Copy link
Author

Nice to know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants