-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: improve require() performance #10789
Conversation
35900a4
to
379fdd0
Compare
379fdd0
to
9112cb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not going to approve yet until after we get some good smoketesting done on this but really happy to see this.
Module._extensions = {}; | ||
Module._cache = Object.create(null); | ||
Module._pathCache = Object.create(null); | ||
Module._extensions = Object.create(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, this is the only bit that may be concerning. I'm hugely +1 on making these changes, but there's obvious risk here. We'll want to smoke test this thoroughly /cc @nodejs/citgm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I started citgm, but it looks like there may be some infrastructure issues, since there was some trouble on downloading various npm packages (not always the same ones) on pretty much every citgm machine.
} | ||
var cacheKey = request + '\x00' + | ||
(paths.length === 1 ? paths[0] : paths.join('\x00')); | ||
var entry = Module._pathCache[cacheKey]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too bad that we can't use the github emoji thumbsup on individual changes ;-)
9112cb7
to
79b7443
Compare
Rebased. CI again: https://ci.nodejs.org/job/node-test-pull-request/6464/ /cc @nodejs/ctc Any interest in this? |
As far as I can tell, none of the failures in the CITGM runs seem to be related to the changes in this PR. |
cc/ @nodejs/citgm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm LGTM on this as a semver major.
}); | ||
} else { | ||
process.nextTick(LOOP); | ||
} | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest how much perf gain did we get from this one ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the async realpath()
? I didn't measure it explicitly because it's not used for module lookup, I merely made the changes to mirror the realpathSync()
changes for consistency.
This change will probably affect APM code that depend on the module loader internals for monkey patching. Paging @watson @hayes @matthewloring. |
@ofrobots How? The new behavior for |
My understanding was |
@ofrobots It will only return such new values if opting into such behavior by passing a 'true' value as a third argument ( |
Ack. Again, just trying to increase visibility for APM folks (who aren't covered by CITGM) who might be subtly dependent (or not). |
PR-URL: nodejs#10789 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>
nullCheck() implicitly converts the argument to string when checking the value, so this commit avoids any unnecessary additional (Buffer) conversions to string. PR-URL: nodejs#10789 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>
Replacing the path separator-finding regexp with a custom function results in a measurable improvement in performance. PR-URL: nodejs#10789 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>
PR-URL: nodejs#10789 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>
PR-URL: nodejs#10789 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>
hasOwnProperty() is known to be slow, do a direct lookup on a "clean" object instead. PR-URL: nodejs#10789 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>
Using a more "direct" method of function calling yields better performance. PR-URL: nodejs#10789 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>
By avoiding JSON.stringify() and simply joining the strings with a delimiter that does not appear in paths, we can improve cached require() performance by at least 50%. Additionally, this commit removes the last source of permanent function deoptimization (const) for _findPath(). PR-URL: nodejs#10789 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>
This commit consists of two changes: * Avoids returning request/id *just* for the debug() output * Returns `null` instead of an empty array for the list of paths PR-URL: nodejs#10789 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]>
Hey, this broke Ghost - can someone from @nodejs/citgm tell me how come CITGM didn't catch it? (Maybe Ghost isn't that popular - but it broke https://www.npmjs.com/package/require-dir with 600k downloads). |
@benjamingr require-dir is not directly tested in CITGM. Can you be more specific about what was broken? |
@targos sure, I think we could have done a better job in communicating these changes to |
@benjamingr This was already fixed in |
@benjamingr Besides, you cannot really deprecate the change made to |
@mscdex allow me to clarify - this is about what could have been done better on our side - not about userland being weird. When I saw (and signed off on) this change - I saw the comment by @jasnell and your discussion about the The conclusion might be to add more packages to CITGM, or to add deprecation warnings in the future before these sort of changes , or to guide users to not rely on prototypes of objects used as maps in the docs. I'm not sure what we should have done differently - but in practice I'm experiencing a hard time from a user point of view (upgrading a system from Node 4 to 8) which I did not anticipate as a project member. According to the issue @TimothyGu linked to this happened to several users and not just me. This is more about how we handle future changes - if you'd prefer it if I took these to the CTC (or main) repo for that sort of discussion that's also fine :) I'd rather focus on how to prevent these sort of breakages in the future rather than the specific issue here - the |
We couldn't have done that because we weren't aware that |
@targos I realize that :) Is there a process to add a package to CITGM or affect which packages are there? I think if we go through packages that are popular but not dependencies of other popular packages - we should find a bunch of useful packages to add to CITGM (Ghost being an example). |
@targos Thanks! As a preference of citgm - in this case - would I add |
I think |
@benjamingr It is really hard for anyone to understand the impact that it will have on the ecosystem. This is unfortunate, and I do not think there is a true solution, as the ecosystem is growing and more packages are added. We can protect only on things we broke in the past by adding them to citgm. On this specific issue, Ghost should have updated its own dependencies, as this was already fixed. I do not see how adding require-dir to CITGM fixed this problem, as Ghost depended on an older version. I think adding Ghost would be very hard, so the best approach is to enable greenkeeper or something similar there. |
(Note in the specific case require-dir fixed this after we broke it) I agree that there is no way we can solve this for the general case - but if we had a downloads/dependents threshold for automatically suggesting CITGM packages I think it would go a long way towards ensuring backwards compatibility or at least give the community time to adapt. |
We currently test ~100 modules, I suspect to make this worthwhile we'd need to be testing a larger percentage of the hundreds of thousands of modules on npm. The limiting factor isn't (yet) the time taken to run on our platforms, it's the manual work needed to triage test failures. |
@gibfahn I'll try and get this issue some warmth and love on the 19th when I'm onboarding first time contributors. |
Benchmark results for the changes in this PR:
First off, I recognize the
module
module is "locked" so I understand there is a possibility only some or none of these commits may be accepted. Either way I have initially marked this as semver-major because of a couple of changes, if they end up not being an issue I will happily remove the semver-major label:The first is mscdex/io.js@eb3b717 which changes the format of the key used in
Module._pathCache
. Instead of using JSON, a simple delimiter is used between the string values being used. This commit probably provides the single largest performance increase by itself, however I am not sure just how many people are making assumptions about the format of the cache keys themselves. A quick search on github at least reveals that there are people directly accessingModule._pathCache
, but they seem to just be iterating over the properties and deleting them when they include some substring (typically a module name the user is wanting to "uncache", in addition to deleting fromrequire.cache
).Secondly, there are the changes in mscdex/io.js@3d8528d, which may cause issues if anyone is both directly accessing any of those objects and using
obj.hasOwnProperty()
.All of the other changes should be backwards compatible AFAIK.
CI: https://ci.nodejs.org/job/node-test-pull-request/5845/
CITGM: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/523/
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)