-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async_hooks: improve resource stack performance #33575
Conversation
Removes some of the performance overhead that came with `executionAsyncResource()` by storing async resources that are managed by JS and those managed by C++ separately, and instead caching the result of `executionAsyncResource()` with low overhead to avoid multiple calls into C++. This particularly improves performance when async hooks are not being used. $ ./node benchmark/compare.js --new ./node --old ./node-master --runs 30 --filter messageport worker | Rscript benchmark/compare.R [00:04:41|% 100| 1/1 files | 60/60 runs | 2/2 configs]: Done confidence improvement accuracy (*) (**) (***) worker/messageport.js n=1000000 payload='object' * 8.85 % ±7.40% ±9.85% ±12.83% worker/messageport.js n=1000000 payload='string' *** 18.56 % ±8.37% ±11.13% ±14.49%
28e6626 (#33131) is what breaks the benchmarks for me. /cc @ronag It looks like the
So I guess this PR makes a tradeoff, but I think I would generally accept a +18 % speedup without async_hooks at the cost of a -15 % perf drop when they are enabled? /cc @Qard |
const jsResourceStack = new SafeMap(); | ||
// Contains either a single key (null) or nothing. If the key is present, | ||
// this points to the current async resource. | ||
const cachedResourceHolder = new SafeMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this single key Map be replaced with a variable slot that can be either null or the resource object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that it needs to be some sort of object that can be easily modified from C++ as well (at least until we have weak references in JS without a flag)… I picked a map because clearing it from C++ never throws exceptions, but it could also be a single-entry Array or object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, yes, this could be a single variable slot, but that might become an issue with memory retention in some weird edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use a symbol property? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Qard Symbol property on what/in place of what? If you’re talking about the map key, that would work just as well, yes. If you’re talking about replacing the map with an object with a single symbol key, that’s harder to clear from C++ than a map, I’d say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, we can remove quite a bit of complexity here once we have weak refs without a flag. That would be the actual correct thing to use here. (The internal WeakRef
we use elsewhere most likely isn’t fast enough, so we’ll need to wait for V8’s one.)
Not sure how I feel about this. I'm 👍 for the performance improvement aspect, but the way this is done makes an already complicated thing even more difficult to follow. I don't have a specific idea on how to do it better at the moment though, so I don't want to block it. I'll go with +0 for now and think on it a bit. |
I just ran it on master and it seems fine? I have not run this before myself but the output seems reasonable?
|
@ronag Here’s what I’m getting:
|
Looks to me like the benchmark does not wait for all the requests to finish. Hence, once it's dispatched all the requests it closes the server and the response is closed, once the Is it possible to make bench.http({
path,
connections,
duration
}, () => {
setTimeout(close, 100); // let requests complete
}); Sorry to hijack this PR. Should we move this to a separate issue? |
@addaleax Can you clarify if this means not being used at all, or being enabled either with |
@rochdev The -15 % is referring to |
8ae28ff
to
2935f72
Compare
So there's two separate things happening in this PR: splitting the resource stack, and caching return values from I'm a bit unclear though on why the previous design with the shared array was so slow. Do you have some insight into that which you could share? |
@Qard Caching the The problem is that the The problem with this PR, and the reason for the perf impact when Ideally, what we’d want is some kind of data structure that can be accessed without significant overhead from both C++ and JS. I’ve thought about that quite a bit, and my personal favourite would be a way to access internal fields of objects from JS (e.g. via V8 extras). However, V8 provides no such built-ins at the time, and there doesn’t appear to be any infrastructure in place to get those. Theoretically, that should be able to come with zero overhead on both sides, though. |
So one thing I’ve been looking at in relation to my MakeCallback trampoline is seeing if all the resource stack stuff could be managed entirely on the JS-side within that trampoline. It’s possible we might not need that stuff to live on the native side at all. 🤔 |
@Qard Yes, that’s also one alternative I’ve thought about. It seems like quite a bit of work to me, but if you can make that work, awesome :) |
I’ll close this in favor of #33801. |
Removes some of the performance overhead that came with `executionAsyncResource()` by storing async resources that are managed by JS and those managed by C++ separately, and instead caching the result of `executionAsyncResource()` with low overhead to avoid multiple calls into C++. In particular, this is useful when using the async_hooks callback trampoline. This particularly improves performance when async hooks are not being used. (This is continuation of nodejs#33575.) Refs: nodejs#33575
Removes some of the performance overhead that came with `executionAsyncResource()` by using the JS resource array only as a cache for the values provided by C++. The fact that we now use an entry trampoline is used to pass the resource without requiring extra C++/JS boundary crossings, and the direct accesses to the JS resource array from C++ are removed in all fast paths. This particularly improves performance when async hooks are not being used. This is a continuation of nodejs#33575 and shares some of its code with it. ./node benchmark/compare.js --new ./node --old ./node-master --runs 30 --filter messageport worker | Rscript benchmark/compare.R [00:06:14|% 100| 1/1 files | 60/60 runs | 2/2 configs]: Done confidence improvement accuracy (*) (**) (***) worker/messageport.js n=1000000 payload='object' ** 12.64 % ±7.30% ±9.72% ±12.65% worker/messageport.js n=1000000 payload='string' * 11.08 % ±9.00% ±11.98% ±15.59% ./node benchmark/compare.js --new ./node --old ./node-master --runs 20 --filter async-resource-vs-destroy async_hooks | Rscript benchmark/compare.R [00:22:35|% 100| 1/1 files | 40/40 runs | 6/6 configs]: Done confidence improvement accuracy (*) async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-local-storage' benchmarker='autocannon' 1.60 % ±7.35% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-resource' benchmarker='autocannon' 6.05 % ±6.57% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='destroy' benchmarker='autocannon' * 8.27 % ±7.50% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-local-storage' benchmarker='autocannon' 7.42 % ±8.22% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-resource' benchmarker='autocannon' 4.33 % ±7.84% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='destroy' benchmarker='autocannon' 5.96 % ±7.15% (**) (***) async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-local-storage' benchmarker='autocannon' ±9.84% ±12.94% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-resource' benchmarker='autocannon' ±8.81% ±11.60% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='destroy' benchmarker='autocannon' ±10.07% ±13.28% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-local-storage' benchmarker='autocannon' ±11.01% ±14.48% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-resource' benchmarker='autocannon' ±10.50% ±13.81% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='destroy' benchmarker='autocannon' ±9.58% ±12.62% Refs: nodejs#33575
Removes some of the performance overhead that came with `executionAsyncResource()` by using the JS resource array only as a cache for the values provided by C++. The fact that we now use an entry trampoline is used to pass the resource without requiring extra C++/JS boundary crossings, and the direct accesses to the JS resource array from C++ are removed in all fast paths. This particularly improves performance when async hooks are not being used. This is a continuation of #33575 and shares some of its code with it. ./node benchmark/compare.js --new ./node --old ./node-master --runs 30 --filter messageport worker | Rscript benchmark/compare.R [00:06:14|% 100| 1/1 files | 60/60 runs | 2/2 configs]: Done confidence improvement accuracy (*) (**) (***) worker/messageport.js n=1000000 payload='object' ** 12.64 % ±7.30% ±9.72% ±12.65% worker/messageport.js n=1000000 payload='string' * 11.08 % ±9.00% ±11.98% ±15.59% ./node benchmark/compare.js --new ./node --old ./node-master --runs 20 --filter async-resource-vs-destroy async_hooks | Rscript benchmark/compare.R [00:22:35|% 100| 1/1 files | 40/40 runs | 6/6 configs]: Done confidence improvement accuracy (*) async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-local-storage' benchmarker='autocannon' 1.60 % ±7.35% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-resource' benchmarker='autocannon' 6.05 % ±6.57% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='destroy' benchmarker='autocannon' * 8.27 % ±7.50% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-local-storage' benchmarker='autocannon' 7.42 % ±8.22% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-resource' benchmarker='autocannon' 4.33 % ±7.84% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='destroy' benchmarker='autocannon' 5.96 % ±7.15% (**) (***) async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-local-storage' benchmarker='autocannon' ±9.84% ±12.94% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-resource' benchmarker='autocannon' ±8.81% ±11.60% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='destroy' benchmarker='autocannon' ±10.07% ±13.28% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-local-storage' benchmarker='autocannon' ±11.01% ±14.48% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-resource' benchmarker='autocannon' ±10.50% ±13.81% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='destroy' benchmarker='autocannon' ±9.58% ±12.62% Refs: #33575 PR-URL: #34319 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gerhard Stöbich <[email protected]> Reviewed-By: Stephen Belanger <[email protected]>
Removes some of the performance overhead that came with `executionAsyncResource()` by using the JS resource array only as a cache for the values provided by C++. The fact that we now use an entry trampoline is used to pass the resource without requiring extra C++/JS boundary crossings, and the direct accesses to the JS resource array from C++ are removed in all fast paths. This particularly improves performance when async hooks are not being used. This is a continuation of #33575 and shares some of its code with it. ./node benchmark/compare.js --new ./node --old ./node-master --runs 30 --filter messageport worker | Rscript benchmark/compare.R [00:06:14|% 100| 1/1 files | 60/60 runs | 2/2 configs]: Done confidence improvement accuracy (*) (**) (***) worker/messageport.js n=1000000 payload='object' ** 12.64 % ±7.30% ±9.72% ±12.65% worker/messageport.js n=1000000 payload='string' * 11.08 % ±9.00% ±11.98% ±15.59% ./node benchmark/compare.js --new ./node --old ./node-master --runs 20 --filter async-resource-vs-destroy async_hooks | Rscript benchmark/compare.R [00:22:35|% 100| 1/1 files | 40/40 runs | 6/6 configs]: Done confidence improvement accuracy (*) async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-local-storage' benchmarker='autocannon' 1.60 % ±7.35% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-resource' benchmarker='autocannon' 6.05 % ±6.57% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='destroy' benchmarker='autocannon' * 8.27 % ±7.50% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-local-storage' benchmarker='autocannon' 7.42 % ±8.22% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-resource' benchmarker='autocannon' 4.33 % ±7.84% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='destroy' benchmarker='autocannon' 5.96 % ±7.15% (**) (***) async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-local-storage' benchmarker='autocannon' ±9.84% ±12.94% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='async-resource' benchmarker='autocannon' ±8.81% ±11.60% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='async' type='destroy' benchmarker='autocannon' ±10.07% ±13.28% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-local-storage' benchmarker='autocannon' ±11.01% ±14.48% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='async-resource' benchmarker='autocannon' ±10.50% ±13.81% async_hooks/async-resource-vs-destroy.js n=1000000 duration=5 connections=500 path='/' asyncMethod='callbacks' type='destroy' benchmarker='autocannon' ±9.58% ±12.62% Refs: #33575 PR-URL: #34319 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Gerhard Stöbich <[email protected]> Reviewed-By: Stephen Belanger <[email protected]>
Removes some of the performance overhead that came with
executionAsyncResource()
by storing async resources thatare managed by JS and those managed by C++ separately, and
instead caching the result of
executionAsyncResource()
withlow overhead to avoid multiple calls into C++.
This particularly improves performance when async hooks are not
being used.
(I’ll also try to run the
async_hooks
benchmarks, but they are failing locally for me on master – hence only the ones for MessagePorts as another measure.)Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes