Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Support module concatenation (scope hoisting) #1104

Closed
fathyb opened this issue Mar 30, 2018 · 11 comments
Closed

[RFC] Support module concatenation (scope hoisting) #1104

fathyb opened this issue Mar 30, 2018 · 11 comments
Labels
Milestone

Comments

@fathyb
Copy link
Contributor

fathyb commented Mar 30, 2018

  • a.js:
import {foo} from './b'

console.log('foo', foo())
  • b.js:
import barFactory from 'bar-module'

export const foo = () => 'hello'
export const bar = () => barFactory
  • bar-module:
export default 'bar'

Build

  • bundle-parcel.js (simplified)
require = (function (modules, cache, entry) {
  // prelude.js
})({
  0: [
    function(require, module, exports) {
      const foo = require('./b').foo;

      console.log('foo', foo())
    },
    {'./b': 1}
  ],
  1: [
    function(require, module, exports) {
      const barFactory = require('bar-module').default;

       exports.foo = () => 'hello'
       exports.bar = () => barFactory
    },
    {'bar-module': 2}
  ],
  2: [
    function(require, module, exports) {
      exports.default = 'bar'
    },
    {}
  ]
}, {}, [0])
  • hypothetical bundle-module-concat.js using module concatenation :
void function () {
  // bar-module
  const $module$bar_module$default = 'bar'

  // b.js
  const barFactory = $module$bar_module$default

  const $module$b_js$foo = () => 'hello'
  const $module$b_js$bar = () => barFactory

  // a.js
  const foo = $module$b_js$foo

  console.log('foo', foo())
}()

Minification using babili 0.1.3 (prettified)

  • bundle-parcel.js :
    • Input size: 2.36 kB (gzip: 870 B)
    • Minified size: 803 B (gzip: 463 B)
    • Saved: 65.97% (gzip: 46.78%)
require = function(a, b, c) {
    function d(c, f) {
        function g(a) {
            return d(g.resolve(a))
        }

        function h(b) {
            return a[c][1][b] || b
        }
        if (!b[c]) {
            if (!a[c]) {
                var i = "function" == typeof require && require;
                if (!f && i) return i(c, !0);
                if (e) return e(c, !0);
                var j = new Error("Cannot find module '" + c + "'");
                throw j.code = "MODULE_NOT_FOUND", j
            }
            g.resolve = h;
            var k = b[c] = new d.Module(c);
            a[c][0].call(k.exports, g, k, k.exports)
        }
        return b[c].exports
    }
    var e = "function" == typeof require && require;
    d.isParcelRequire = !0, d.Module = function(a) {
        this.id = a, this.bundle = d, this.exports = {}
    }, d.modules = a, d.cache = b, d.parent = e;
    for (var f = 0; f < c.length; f++) d(c[f]);
    return d
}({
    0: [function(a) {
        const b = a("./b").foo;
        console.log("foo", b())
    }, {
        "./b": 1
    }],
    1: [function(a, b, c) {
        const d = a("bar-module").default;
        c.foo = () => "hello", c.bar = () => d
    }, {
        "bar-module": 2
    }],
    2: [function(a, b, c) {
        c.default = "bar"
    }, {}]
}, {}, [0]);
  • bundle-module-concat.js :
    • Input size: 297 B (gzip: 164 B)
    • Minified size: 54 B (gzip: 74 B)
    • Saved: 81.82% (gzip: 54.88%)
void function(){
  console.log('foo',(()=>'hello')())
}()

Using Prepack

console.log("foo", "hello");

Implemented by

Pro

  • Faster runtime startup
  • Smaller bundles
  • Easier to debug stacktraces
  • Implements tree-shaking by design and probably in a better way using minifiers (side-effects)

Cons

  • A real pain to implement correctly given the current state of the ecosystem (es6/cjs, see Webpack bailouts)
  • Makes parallelism more difficult

Some questions :

  • How would we expose this to the user? Only on prod build? Behind a CLI option?
  • How would we implement it technically?
  • Is it worth it?
@fathyb fathyb added 💬 RFC Request For Comments 🙋‍♀️ Feature labels Mar 30, 2018
@devongovett devongovett added this to the v1.8.0 milestone Apr 1, 2018
@devongovett
Copy link
Member

Great write up! Related: #392

@DeMoorJasper
Copy link
Member

DeMoorJasper commented Apr 1, 2018

Seems like minification and concatenation should happen in the packager for this feature?
What if we would run the packagers in the workers, than the performance issue would sort of be resolved?
Or perhaps we could create the concatenated code pieces inside JSAsset than it'll propably barely impact performance at all?

This definitely seems like an easier and more reliable option than writing a custom treeshaker.

@devongovett
Copy link
Member

I think much of this can be done at the module level inside a worker. If we rename all variables in the top-level scope of a module in a predictable way like @fathyb described (e.g. $module$bar_module$default - maybe we'd want a hash of the module's filename in there or something), this can be done without context to the rest of the modules. Then the packager just needs to literally concatenate them all together.

We'd need to take special care to handle this inside a module, which should refer to the module the same way as module.exports. Otherwise it should be safe.

Here are the cases where this would not be possible:

  1. When HMR is enabled. In that case, we rely on each module being in its own function so we can re-execute just the modules that changed. If we only enable this in production mode, no problem.
  2. When loading an external module, e.g. code splitting. In this case, the require function checks all loaded bundles to find a module. e.g. if you require something that is in the parent bundle, it will go to the previous one and check there. I think this should be possible to detect - we can just leave the require call in that case rather than replacing with a module variable name.
  3. There are probably more. See webpack's optimization bailouts. Some of them I'm not sure about - I think we can handle CommonJS not just ESM, but not sure. Would like to understand why they implemented it this way.

Some cases where module concatenation alone is not enough:

  1. External side effects. Some modules might have exports that aren't used but still have side effects like assigning to globals, etc. Webpack 4 supports the sideEffects: false flag in package.json. See here. Big libraries like Angular and RxJS already support it. In some cases, I don't think minifiers are smart enough to detect all side effect free code, so this allows package authors to explicitly tell the compiler.

  2. Related, but computation within a module to generate its exports. e.g. module.exports = add(2, 3). In this case, the add function and the exports variable have to be included because the minified doesn't know if the add function might have side effects. But if we know that the export isn't used, and the function has no side effects we can remove it completely.

Aside from the above issues, I think this would be a great start on our way to tree shaking! 🎉

@devongovett
Copy link
Member

I started a branch for this: scope-hoist.

Obviously still a lot to do but the basics seem to work. It's implemented as a babel transform, and it happens entirely in the worker process. There is currently a new replacement JS packager that does the concatenation and final variable replacements to link modules together. This may get more complicated and get merged back into the main JSPackager, not sure yet.

As an example, here is what happens with the following input:

import leftpad from 'leftpad';

function add(a, b) {
  return a + b;
}

export default leftpad(add(1, 2) + 4, 10);
console.log(module.exports);

Output:

(function () {
var $3$exports = {};
$3$exports = function (str, width, char) {
  char = char || "0";
  str = str.toString();
  while (str.length < width) str = char + str;
  return str;
};
'use strict';

var $1$exports = {};
Object.defineProperty($1$exports, "__esModule", {
  value: true
});
var $1$var$_leftpad = $3$exports;
var $1$var$_leftpad2 = $1$var$_interopRequireDefault($1$var$_leftpad);

function $1$var$_interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { default: obj }; }

function $1$var$add(a, b) {
  return a + b;
}

$1$exports.default = (0, $1$var$_leftpad2.default)($1$var$add(1, 2) + 4, 10);

console.log($1$exports);
})();

When you use eval, we need to bail out of the variable renaming.

import leftpad from 'leftpad';

function add(a, b) {
  return a + b;
}

eval('exports.foo = "bar"');

export default leftpad(add(1, 2) + 4, 10);
console.log(module.exports);

Output:

(function () {
var $3$exports = {};
$3$exports = function (str, width, char) {
  char = char || "0";
  str = str.toString();
  while (str.length < width) str = char + str;
  return str;
};
var $1$exports = function () {
  var exports = this;
  var module = {
    exports: this
  };
  Object.defineProperty(exports, "__esModule", {
    value: true
  });
  var _leftpad = $3$exports;

  var _leftpad2 = _interopRequireDefault(_leftpad);

  function _interopRequireDefault(obj) {
    return obj && obj.__esModule ? obj : {
      default: obj
    };
  }

  function add(a, b) {
    return a + b;
  }

  eval('exports.foo = "bar"');

  exports.default = (0, _leftpad2.default)(add(1, 2) + 4, 10);

  console.log(module.exports);
  return module.exports;
}.call({});
})();

Still not too bad. Already better than webpack, which doesn't support tree shaking for common js modules at all, and when eval is involved, bails out even further back to the normal module wrappers.

Where this gets problematic is code splitting. We really need to be able to expose modules so they can be required from different bundles. I'm not sure yet how to do that. Maybe we can detect those modules that are required across bundles, and expose them somehow.

Let me know if you have ideas on how to implement those things, and feel free to work off of this branch if you want to play around!

@vigneshshanmugam
Copy link

vigneshshanmugam commented Apr 2, 2018

When HMR is enabled. In that case, we rely on each module being in its own function so we can re-execute just the modules that changed. If we only enable this in production mode, no problem.

This would still be possible to do if we separate the affected modules and create its own scope which would be identical to the initial bundle. I believe webpack does something similar to this.

@devongovett
Copy link
Member

This would still be possible to do if we separate the affected modules and create its own scope which would be identical to the initial bundle. I believe webpack does something similar to this.

Yeah but in effect this is the same as not scope hoisting at all

@vigneshshanmugam
Copy link

Totally agree, But should it matter a lot in development mode? Its better than ignoring scope hoisting for the whole development pipeline.

@fathyb
Copy link
Contributor Author

fathyb commented Apr 3, 2018

Good job @devongovett! The implementation is way simpler than I originally thought.

I'm currently trying to tune babel/minify to produce smaller bundles (eg. trim $3$exports unused elements) but we currently useuglify-es.
I feel like babel/minify might be a better fit given that we internally rely on babel and we could share the AST with it, or let the use choose. The idea is that we will probably need to extend and/or contribute to the minifier we use for a better result.

@devongovett
Copy link
Member

Yeah the problem is that currently we do minification on a per asset basis, but in order to get smaller sizes we'll need to do that over the entire bundle together (in the packager). Otherwise, the minifier won't be able to do dead code elimination correctly since it won't have the other assets to see what variables are used. Since this will happen across the process boundary, we probably won't be able to share the AST anyway.

Happy to explore babel-minify as an alternative to uglify. Last time we compared, babel-minify was significantly slower and more buggy, but worth a try again as things may have improved. Want to try it out and compare?

@devongovett
Copy link
Member

One interesting finding: minifiers can't eliminate dead babel-compiled ES modules because they have side effects by default.

Input:

export default function leftpad() {
  return 4;
}

Output:

var $3$exports = {};
Object.defineProperty($3$exports, "__esModule", {
  value: true
});
$3$exports.default = $3$var$leftpad;
function $3$var$leftpad() {
  return 4;
}

The $3$exports variable cannot be eliminated because of the Object.defineProperty call, and even without that uglify still doesn't remove the variable unless the default setter is also removed. babel-minify seems to do better at this, but still doesn't drop if the Object.defineProperty is there. Perhaps that can be solved by switching to loose mode for the module transform since then the __esModule property is set as a normal property rather than using Object.defineProperty . Not sure what the other ramifications of that change would be.

@devongovett
Copy link
Member

Hmm, seems like uglify-es isn't even being maintained anymore actually, according to the comments here: webpack-contrib/uglifyjs-webpack-plugin#262, mishoo/UglifyJS#2908 (comment), mishoo/UglifyJS#3010 (comment). Guess it's time to look at switching to babel-minify by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants