-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[emval] Move reference counting to C++ #20447
base: main
Are you sure you want to change the base?
Conversation
This allows to completely optimise away the extra allocation / wrapper when values are local, and generally avoids crossing the JS<>Wasm boundary as often.
This reduces size of class val to a single pointer, and still gets optimised out as an allocation.
This allows to completely eliminate statements like `val v;` as compiler can statically prove it doesn't need _emval_free.
C++ doesn't allow elimination of new/delete pairs, but in C malloc/free pairs can be and are eliminated by LLVM. In theory Clang's __builtin_operator_{new,delete} could also help here, but in my experiments it wasn't optimised out while malloc/free still were. I don't think we care about calling into user's overrides in this class too much, and the optimisation is valuable enough, so just using standard C funcs.
This reverts commit e15a9f7.
014081a
to
333caed
Compare
Marking as draft for now as there's one unresolved issue. |
Hm the remaining issues might be easier to solve after #20383 and another planned subsequent PR. I'll leave this as a draft for now. |
Have you considered using the same array approach on the C++ side for refcount as the JS-side? That is, having a single vector<val_metadata> with refcounts indexed by handle, such that most new handles won't require an allocation? Ah, because in multi-threaded mode you could have handle collisions between threads? Could have an array per thread. |
Yes, I played with it, but that prevents compiler optimisations that allow to completely eliminate allocation/free pairs, which are extremely valuable as most of Embind values are temporary. Compilers are happy to optimise malloc/free pairs because those are simple well-known APIs but they can't see through all the mechanics of something as complex as a |
'fromWireType': (handle) => { | ||
var rv = Emval.toValue(handle); | ||
__emval_decref(handle); | ||
return rv; | ||
}, | ||
'fromWireType': (handle) => Emval.toValue(handle), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I'm missing something obvious, but how do you get away with removing the incref/decref from the C++ toWireType and JS fromWireType? Won't the C++ destructor call free before the JS fromWireType reads Emval.toValue?
C++ is super eager with copying values by default
and asking questions laterinstead of moving, and each of those copies had to cross JS boundary on creation/deletion just to increase/decrease reference count correspondingly.Moving reference count from JS to C++ allows to make all this copying substantially cheaper as it becomes a single integer increment/decrement on the Wasm side which, in most cases, can even be completely eliminated by LLVM.
This does mean that we need to heap-allocate a small object with value's metadata on the C++ side, and I know it can be a controversial change, but:
Note that reference counting and allocation optimisation for local values only applies to speed modes (
-O1
and higher) and not to-Os
where LLVM avoids inlining functions, but microbenchmarks show that even in situations where copies can't be eliminated, this approach is still faster:benchmarks.cpp
Before (timings and JS+Wasm byte sizes for different modes):
After:
The only small slowdown is for initial allocation & freeing in
-Os
mode, but it's typically less common than copying, and it's in exchange for a small code size win, which is, well, point of that mode anyway.