Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize with thread loader and terser #27014

Merged
merged 26 commits into from
Dec 18, 2018

Conversation

mistic
Copy link
Member

@mistic mistic commented Dec 12, 2018

This PR is the result of revisiting an idea explored on #20749 with HappyPack (at that time the results were not good) in the very start of the DLL work experiments and also tried by @jbudz on #21994.

However this changes implements and explores things that we didn't tried before and that are giving us a lot of benefits.

Changes:

  • Add thread-loader to the basic optimizer both for babel-loader and ts-loader and also adding it to the babel-loader into the dll compiler
  • Add cache-loader for the babel-loader into the dll compiler
  • Create an unique thread-loader pool warming it up into the base optimizer constructor and sharing it also with the dll compiler
  • Replace the uglifyJsPlugin with terser-webpack-plugin as the later is the one used and supported right now by webpack since version 4.26.0.
  • Remove compress from terser-webpack-plugin into the base_optimizer. We have already done this in the past but were only keeping it due to the dead-code the removal check in React DevTools. So we can keep doing it, but only for the DynamicDllCompiler.

Results:

The results are from a machine with processing power for 8 threads however I guess it should be closer for machines with a processing power of 2 or more threads.

  • ~ -45% time when optimizing in dev (or running from source)
  • ~ -55% time running the optimizer from the distributable
  • ~ + 0.3 - 1s time running the optimizer in dev when starting the server and we don't have any change since the last optimization. This is happening due to the overhead of spinning up the thread-loader worker pool (even warming up them). However in the subsequent rebuilds during the watch no time increase was detected.

Overall I just felt like the benefits are awesome!

@mistic mistic added review Team:Operations Team label for Operations Team v7.0.0 v6.6.0 labels Dec 12, 2018
@mistic mistic self-assigned this Dec 12, 2018
@mistic mistic requested review from spalger and jbudz December 12, 2018 04:43
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@mistic
Copy link
Member Author

mistic commented Dec 12, 2018

@jbudz @spalger I've updated the workerNodeArgs to contain the exact same arguments set on NODE_OPTIONS. I believe this way it will be more accurate and generic for any other NODE_OPTIONS that the users are applying to the main process already.

@elastic elastic deleted a comment from elasticmachine Dec 12, 2018
@mistic

This comment has been minimized.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Copy link
Contributor

@spalger spalger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, seems to work great, but I do think we should apply a max to the process count.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@tylersmalley
Copy link
Contributor

This is great, and I am very much looking forward to this change. However, before we merge we need to understand the total memory overhead of an optimize process in production. While we don't provide memory requirements, there are a lot of users with 2GB instances which expect Kibana to optimize within. Understanding what the changes here does to effect those will help us understand how we go forward including if we need to further reduce the max threads due to per-thread overhead.

…use the same calculated number of cpus to use on parallel tasks on thread loader pool config for terser parallel. refact(NA): lower down the poolTimeout on non dev mode. refact(NA): change devOnlyModules to nonDistributableOnlyModules on warmupThreadLoader config.
@mistic
Copy link
Member Author

mistic commented Dec 18, 2018

@tylersmalley I just ran some tests over 2 digital ocean droplets one of them without the thread-loader worker present on this PR and another one having this work. Both instances were also merged with the branch containing the work for this PR #27259 in order to be able to run and create production bundles from source.

Dropplets Hardware: 8GB/4vCPUs/60GB
Dropplets OS: CentOS 7.5 x64
Default Heap for Node 10 For The Hardware: 1526 MB
Cmd used to measure memory (Field = Mem: used): free -lmw -s 1

Note: (The following memory stats represents the max memory value used by the Droplet running the referenced command. Both instances only had the OS, the free command and the kibana running.)

node scripts/kibana --env.name=production (running from source with production bundles)

Non-Threaded Max Memory: 2724 MB
Non-Threaded Optimization Time: 231.06 seconds

Threaded Max Memory: 2751 MB
Threaded Optimization Time: 98.44 seconds

./bin/kibana (running from distributable)

Non-Threaded Max Memory: 2342 MB
Non-Threaded Optimization Time: 110.28 seconds

Threaded Max Memory: 1617 MB
Threaded Optimization Time: 34.18 seconds

I think from this date we can conclude this PR will improve both the optimization time and the memory footprint over the current master a LOT instead of making it worst 🎉 😄

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@tylersmalley
Copy link
Contributor

This is very exciting @mistic, and thanks for benchmarking - I also did a quick test on a 2GB VM an see ~200MB less memory overhead when optimizing (down from 1.2GB).

🎉

@spalger
Copy link
Contributor

spalger commented Dec 18, 2018

Woah, I'm kinda shocked by that result. More processes takes up less memory... wouldn't have guessed!

@mistic mistic force-pushed the optimize-with-thread-loader branch from 3025f1d to 1e8538f Compare December 18, 2018 16:46
@tylersmalley
Copy link
Contributor

tylersmalley commented Dec 18, 2018

I wonder if the memory improvement came from improvements done in Terser - UglifyJS was historically bad with memory management.

@mistic
Copy link
Member Author

mistic commented Dec 18, 2018

Yeah I agree with you @tylersmalley and just remember that Terser is also enabled with the parallel option which means that an improvement on the minimizer itself will be multiplied by the number of workers too.

@elasticmachine

This comment has been minimized.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@mistic mistic merged commit efdf63d into elastic:master Dec 18, 2018
mistic added a commit to mistic/kibana that referenced this pull request Dec 18, 2018
* multi thread expensive loaders

* revert styles

* feat(NA): added thread-loader and cache-loader to base optimizer and dll compiler.

* feat(NA): added cache-loader and thread-loader to the optimizer and dll compiler.

* feat(NA): use new terser plugin instead of old unmaintained uglifyjs webpack plugin.

* refact(NA): remove unused configs from base optimizer and dll compiler.

* fix(NA): available cpu calculated number.

* docs(NA): fix comment about what we are doing in prod on base_optimizer config.

* docs(NA): explain why we are setting memory into each thread loader worker.

* fix(NA): add dev only loaders to the thread-loader warmup config.

* refact(NA): change name from babelCacheDir to babelLoaderCacheDir.

* fix(NA): logic for calculating available cpus.

* feat(NA): pass NODE_OPTIONS along for the optimizer forked process and also for the optimizer workers.

* feat(NA): remove terser webpack plugin compression from base_optimizer and only run it on dll compiler.

* chore(NA): update function to calculate available cpus for works.

* fix(NA): apply upperbound to the number of workers we can have on thread-loader.

* fix(NA): decrease the max number of thread pool workers. refact(NA): use the same calculated number of cpus to use on parallel tasks on thread loader pool config for terser parallel. refact(NA): lower down the poolTimeout on non dev mode. refact(NA): change devOnlyModules to nonDistributableOnlyModules on warmupThreadLoader config.

* chore(NA): update yarn lock deps after merging with master.
mistic added a commit that referenced this pull request Dec 18, 2018
* multi thread expensive loaders

* revert styles

* feat(NA): added thread-loader and cache-loader to base optimizer and dll compiler.

* feat(NA): added cache-loader and thread-loader to the optimizer and dll compiler.

* feat(NA): use new terser plugin instead of old unmaintained uglifyjs webpack plugin.

* refact(NA): remove unused configs from base optimizer and dll compiler.

* fix(NA): available cpu calculated number.

* docs(NA): fix comment about what we are doing in prod on base_optimizer config.

* docs(NA): explain why we are setting memory into each thread loader worker.

* fix(NA): add dev only loaders to the thread-loader warmup config.

* refact(NA): change name from babelCacheDir to babelLoaderCacheDir.

* fix(NA): logic for calculating available cpus.

* feat(NA): pass NODE_OPTIONS along for the optimizer forked process and also for the optimizer workers.

* feat(NA): remove terser webpack plugin compression from base_optimizer and only run it on dll compiler.

* chore(NA): update function to calculate available cpus for works.

* fix(NA): apply upperbound to the number of workers we can have on thread-loader.

* fix(NA): decrease the max number of thread pool workers. refact(NA): use the same calculated number of cpus to use on parallel tasks on thread loader pool config for terser parallel. refact(NA): lower down the poolTimeout on non dev mode. refact(NA): change devOnlyModules to nonDistributableOnlyModules on warmupThreadLoader config.

* chore(NA): update yarn lock deps after merging with master.
@mistic
Copy link
Member Author

mistic commented Dec 18, 2018

6.x: d53a9a2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Team:Operations Team label for Operations Team v6.6.0 v7.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants