-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BasicCABI] Pass compiler-rt 128-bit return values in memory #223
Conversation
Currently our LLVM Wasm backend returns 128-bit values as two `i64`s in case multivalue is enabled: https://github.com/llvm/llvm-project/blob/a5f576e5961ecc099bd7ccf8565da090edc84b0d/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp#L697-L700 But given that neither emscripten nor wasm-sdk seem to provide a multivalue version of compiler-rt, it looks this has not been working so far and the reason we haven't heard complaints was likely that no one was using compiler-rt with multivalue enabled. Maintaining and providing two different versions of compiler-rt is a cumbersome thing for toolchains, and emscripten already has to provide multiple versions of many libraries (e.g. threaded vs. non-threaded, debug vs. release, exception-enabled vs. disabled, ...). Also enabling the multivalue return on several compiler-rt functions that have a 128-bit return value wouldn't affect performance in a meaningful way, given that there are not many of them. I had a chance to chat with several people who contribute here offline this morning, and it looked we agreed that there is not much benefit to enabling multivalue return in compiler-rt functions. One thing I'm not sure is whether we decided to disable 128-bit multivalue returns for only compiler-rt functions or for all user functions. This PR currently says we do that only for compiler-rt; please let me know if you think we should do otherwise.
Since the general principle we've been discussing is that changing feature flags should never change the ABI, I think it makes sense to disable 128-bit multivalue returns everywhere. If we want, we can continue allowing them in the experimental multivalue ABI, though, since that is a separate ABI. |
In Wasmtime, we've seen a lot of bottlenecks on compiler-rt's More details/investigation by @jameysharp over here: bytecodealliance/wasmtime#4077 (comment) It would be a shame if we precluded returning |
This will also hurt users that do compile their own entire sysroot with experimental mv abi and expect compiler-rt to not have in-memory overhead with i128s. |
Doesn't wasm-opt take care of inlining that function? Or is there some toolchain that doesn't include running wasm-opt on the final binary? |
The Zig toolchain always recompiles the sysroot with the same flags (including wasm extensions) as the rest of the code, and doesn't run |
[1] In case of parameters, `long long double`, `__int128`, and `__float128` are | ||
passed directly as two `i64` values. In case of return values, for | ||
non-compiler-rt functions, they are passed directly as two `i64` values if | ||
[multivalue](https://github.com/WebAssembly/multi-value) is enabled and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is true is it? Enable multi-value doesn't change the ABI, right?
We do have an experimental multi-value ABI but its not official, and we probably don't want to be mentioning it at all here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this once we decide what to do with the feature flag / experimental flag / ABI.
So is there room to add a |
This is very unlikely. The toolchain is self-contained, and doesn't spawn external commands.
For apps written in Zig, it can take care of inlining. Not for apps written in C/C++. __int128 mul(__int128 a, __int128 b) { return a * b; } zig cc -O3 -c --target=wasm32-freestanding -mcpu=baseline+multivalue a.c
|
Can I ask what Our current plan for clang/llvm is that just turning on the feature shouldn't effect the C ABI at all. |
|
array | indirect | N/A | | ||
|
||
[1] In case of parameters, `long long double`, `__int128`, and `__float128` are | ||
passed directly as two `i64` values. In case of return values, for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1]
appears once in the "Parameter" column, so I'm confused about seeing "in case of return values" here. Should that part be in [2]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it should go to [2]
. Given that whether we are gonna do this or not is unclear, will fix it later (if we go this route)
LTO is another way to get such functions inlined. Does Zig support that? Otherwise, in general, a huge advantage of wasm over typical targets is exactly that it is very simple to inline calls even without LTO. So even if a toolchain like Zig doesn't want to run an external tool like For those reasons I'd suggest not worrying much about the inlining aspect here. |
These are my current thoughts: I agree with #223 (comment) that we shouldn't treat compiler-rt and other libraries differently. One big motivator for this PR was it's hard for toolchains to maintain two different versions of libraries, and it turns out not only compiler-rt but other libraries like libc and libc++ also have Unless you are not building all your libraries from scratch by yourself, which Zig seems to do, disabling multivalue return doesn't regress the status quo for most of the users, given that neither emscripten nor wasi-sdk currently provides multivalue returning versions of libraries. But I also understand the concern that it's suboptimal to preclude the possibility of returning multivalue in the future for better performance. I had assumed the usage of those functions wouldn't affect the performance in a meaningful way but apparently there were cases they did. I chatted w/ @dschuff offline and we agreed that while enabling a certain feature can (and usually do) change the output code, it might not be desirable for that to change the ABI, and changing the ABI should be dependent on a separate flag. And we don't need to distinguish compiler-rt vs. non-compiler-rt or library vs. user code. We already have an experimental flag that enables multivalue for struct return values: So our suggestion is, how about making the lowering of 128-bit return values also depend on Downsides of this approach can be 1. current users of multivalue-returning feature have to opt in to use that option 2. I'd like to hear what people think about this way forward (making |
As long as
That's better discussed on another thread, but my option is such limits, if any, should be treated as "Implementation Limitations" in core wasm spec. |
If I might throw out another possible alternative, could it perhaps also be possible to write this down in source itself? For example compiler-rt is specific to a toolchain version (or so I believe) which means that even if the default C ABI is to not use multi-value compiler-rt itself could use multi-value. That would enable defining This would also solve a problem of being able to intermix multi-value-returns-and-not. For example the outermost function might want to use multi-value as it's trying to match an exact API "shape" of the wasm module being created, but everything else just wants to interoperate with the system which is likely not going to use multi-value for some time now. Enabling the two ABIs to be present in the same module would help avoid making multi-value an all-or-nothing decision. In such a world I would imagine that |
Is the problem here that the cryptography functions are compiling with the assumption that they have native int128 support? Perhaps if they would prefer a different codepath if they knew that int128 were being software emulated? i.e. is there some |
The limitation we want to impose here is not the "limitation" in the core spec sense, which is used for the maximum number of something that the engine implementation supports: What I suggested, the number of items in a struct that will be returned as multivalue, which I assume to be less than 10, is more of an internal parameter that drives the optimization. It's not that the implementation cannot handle 1000 return values; it's just that it's bad for the code quality and we wouldn't want to do that. I think it might be worth mentioning it in this ABI document, but I don't think that should go into the core spec. |
If |
I believe the libcall function such as So in theory we can use whatever ABI we like and I think it is allowed to differ from the normal C ABI. See https://github.com/llvm/llvm-project/blob/12a2bc301fe83eea3b214428827d712c8cfb28a9/compiler-rt/lib/builtins/int_lib.h#L23-L31 |
Using a custom, libcall-specific ABI that avoids touching memory for |
To be clear, are you proposing that this libcall-specific ABI would vary with I addition to the complexity of shipping and selecting such as version of compiler-rt, I think building it could be very tricky since it would there is no value we could use |
I was assuming that in a world with universal support for multi-value (effectively the state of the world now, but I recognize that this is a separate, larger discussion) that the If I understand the constraints correctly, and given that assumption of universal multi-value support regardless of C ABI in use, this wouldn't require maintaining multiple compiler-rt versions nor concerns around selecting the correct version. Is that correct? |
True, once multi-value is universally available we could just assume it when we build compiler-rt. In emscripten we will want to continue to support targets without multivalue (i.e. user of I am still curious to hear about from you regarding the importance of optimizing the software emulation of int128: #223 (comment). Are you sure we really need to care about it enough to invent a new ABI that differs from the normal C ABI? |
The assumption is that if the The intuition is that WebAssembly would behave the same way. The fact that this is not the case, even with LTO, was not expected https://gist.github.com/jedisct1/dfc18913680c24824595b50b8eb04db3 |
I'm personally not sure about these particulars. @jameysharp might have a better idea, as he was the one doing the original investigation. |
Can you explain what you mean here? Have you seen native platform target that provide bitcode/LTO-able versions of compiler-rt? As far as I know there are no native targets that provide LTO-able versions of compiler-rt functions. The reason I think this is that in emscripten (where are pushing the limits of what we make LTO-able) we still haven't found a way to make compiler-rt function compile as bitcode (i.e. LTO-able). See https://github.com/emscripten-core/emscripten/blob/10827283603d0d8dba0a813827b5ee5f82d1adf5/tools/system_libs.py#L922-L924 |
Can we lower multivalue away completely? We can do that within a function or across functions in static linking + non-exported/imported functions, but can we do that with imported/exported functions or dynamic linking? |
Before we enable multi-value by default we would need some way to lower it away for users targeting older browsers. If we can't lower it away then that would be a one reason not to support the use of multi-value for libcalls purely on the basis of the feature exsiting ( |
I haven't, but at least on mips64, riscv64, aarch64 and x86_64, a 128-bit multiplication doesn't call the $ for target in mips64-linux riscv64-linux aarch64-linux x86_64-linux; do
zig cc -S -target "$target" -O3 a.c && ug -q __multi3 a.s && echo "[$target]: __multi3 called";
done
(empty) But on WebAssembly, even if the |
Ah I see. Yeah I think code annotations can be one way to support different ABIs for certain performance-critical functions. But if we decide to do this in future, it seems also compatible with the clang option I suggested in #223 (comment). Also given that some people (especially who own the whole toolchain) would want to enable multivalue lowering for all potential candidates and wouldn't want to annotate every function that has multiple or 128-bit return values, having that kind of option can be convenient. |
I think it does make sense in general to allow the calling convention ABI to be determined on a per-function basis, with a clang attribute you can attach to a function declaration that propagates down through the IR via an attribute or some similar mechanism. This is how ABIs work on ARM, which has a similar proliferation of optional features and calling conventions (e.g. you can mark a function with Then we can address the best way to optimize the performance and use of the calling convention and performance of Separately I think we should also just add widening multiply, add-with-carry, and maybe a few other related instructions to wasm. Those have always been fairly uncontroversial but just punted until MV is available everywhere, so maybe that time is now. That's obviously a bit slower process than these toolchain changes could be, so maybe we do both. |
Emscripten EH/SjLj uses invoke wrappers in JS, such as `invoke_vi`, from which user functions are called indirectly, to emulate exceptions and setjmp/longjmp. But in case the invoked function returns multiple values or a 128-bit value, its the JS invoke wrappers cannot return multivalue because JS doesn't support that. So we should not enable multivalue returns for the JS invoke wrappers and also the functions called by those JS wrappers because their signature has to match with the JS wrapper. For example, if `func` returns `{i32, i32}` and we have ```ll invoke {i32, i32} @func() ... ``` while LowerEmscriptenEHSjLj will lower it down to something like ```ll %0 = call { i32, i32 } @"__invoke_{i32.i32}_void"(ptr @func) ... ``` we should eventually lower both the invoke wrapper (whose name will be changed later to `invoke_vi`) and `func` down to a signature that indirectly returns multiple values by memory parameter, because JS invoke wrappers do support multiple return values. So we need to disable multivalue returns for JS invoke wrappers and functions called by them. I think we have three ways to do that: 1. Make a set and add all functions that are invoked by JS invoke wrappers in LowerEmscriptenEHSjLj and pass it to the backend using an auxiliary data structure. We have a precedent of this kind of structure already, which is used for Wasm EH: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h We can even add this set to `WasmEHFuncInfo` maybe, given that this is also a way of handling exceptions in Wasm. - Pros: Most precise - Cons: Auxiliary structure needed 2. Unless we record the invoked functions in LowerEmscriptenEHSjLj like 1, we don't have a way of precisely knowing the set of invoked functions. But they are all indirectly invoked, so in the backend we can check whether a given function is ever indirectly used (i.e., its pointer taken) by traversing its `users()` in the IR and if it is, we don't allow multivalue returns for them. - Pros: No auxiliary structure - Cons: IR checking overhead. More conservative than 1. 3. Disallow all multivalue returns when Emscripten EH or SjLj is enabled. - Pros: Simplest - Cons: Most conservative. This PR is doing 3. While it is the most conservative and possibly disallowing multivalue returns from more functions than needed, I chose this because it is the simplest, and given that hopefully more people will adopt Wasm EH going forward, I don't think there will be many people who would use multivalue and Emscripten EH/SjLj together and want the whatever performance benefit that multivalue return can bring, given that Emscripten EH/SjLj has already a huge performance cost. This is separate from whether we should make the multivalue return dependent on the multivalue feature or something else like a clang flag, which is being partly discussed in WebAssembly/tool-conventions#223. Whichever way we decide on that front, we still need to disable multivalue returns in case Emscripten EH/SjLj is used.
I agree with @dschuff that we should push things like widening multiplication or similar things into wasm. There's a related issue: WebAssembly/design#1495. |
No, I agree it's hacky but doable.
I did some experiments that showed performance wins from multivalue function return, but those experiments did not say what an optimal size would be. Basically multivalue return is always faster than returning through memory, but at some point the potential code size bloat would not be worth it any more. Dan's suggestion of 4 seems reasonable to me. |
Multivalue feature of WebAssembly has been standardized for several years now. I think it makes sense to be able to enable it in the feature section by default for our clang/llvm-produced binaries so that the multivalue feature can be used as necessary when necessary within our toolchain and also when running other optimizers (e.g. wasm-opt) after the LLVM code generation. But some WebAssembly toolchains, such as Emscripten, do not provide both mulvalue-returning and not-multivalue-returning versions of libraries. Also allowing the uses of multivalue in the features section does not necessarily mean we generate them whenever we can to the fullest, which is a different code generation / optimization option. So this makes the lowering of multivalue returns conditional on the use of 'experimental-mv' target ABI. This ABI is turned off by default and turned on by passing `-Xclang -target-abi -Xclang experimental-mv` to `clang`, or `-target-abi experimental-mv` to `clang -cc1` or `llc`. But the purpose of this PR is not tying the multivalue lowering to this specific 'experimental-mv'. 'experimental-mv' is just one multivalue ABI we currently have, and it is still experimental, meaning it is not very well optimized or tuned for performance. (e.g. it does not have the limitation of the max number of multivalue-lowered values, which can be detrimental to performance.) We may change the name of this ABI, or improve it, or add a new multivalue ABI in the future. Also I heard that WASI is planning to add their multivalue ABI soon. So the plan is, whenever any one of multivalue ABIs is enabled, we enable the lowering of multivalue returns in the backend. We currently have only 'experimental-mv' in the repo so we only check for that in this PR. Related past discussions: llvm#82714 WebAssembly/tool-conventions#223 (comment)
I submitted a PR to do what I proposed above: llvm/llvm-project#88492 This makes the lowering multivalue returns in the backend conditional to the use of 'experimental-mv' option. But I plan to extend this to other multivalue ABIs as they are added; if WASI adds their multivalue ABI, the backend can also detect that ABI and enable multivalue return lowering. |
In case this still matters: I was testing a Rust implementation of RSA, compiled to WebAssembly and then run on Wasmtime on x86-64. What I found was a ~10% improvement in the speed of RSA operations by changing the underlying multi-precision integer library from 64-bit limbs to 32-bit, which fortunately it had a build-time configuration option for. When that library did multiplication, it needed double-width temporaries, so with 64-bit limbs it used 128-bit multiplies implemented by libcalls to I believe, but haven't proven yet, that in Wasmtime we have the ability to write optimization rules which will transform the arithmetic in |
Seems to be at least relevant for RSA See WebAssembly/tool-conventions#223 (comment)
…88492) Multivalue feature of WebAssembly has been standardized for several years now. I think it makes sense to be able to enable it in the feature section by default for our clang/llvm-produced binaries so that the multivalue feature can be used as necessary when necessary within our toolchain and also when running other optimizers (e.g. wasm-opt) after the LLVM code generation. But some WebAssembly toolchains, such as Emscripten, do not provide both mulvalue-returning and not-multivalue-returning versions of libraries. Also allowing the uses of multivalue in the features section does not necessarily mean we generate them whenever we can to the fullest, which is a different code generation / optimization option. So this makes the lowering of multivalue returns conditional on the use of 'experimental-mv' target ABI. This ABI is turned off by default and turned on by passing `-Xclang -target-abi -Xclang experimental-mv` to `clang`, or `-target-abi experimental-mv` to `clang -cc1` or `llc`. But the purpose of this PR is not tying the multivalue lowering to this specific 'experimental-mv'. 'experimental-mv' is just one multivalue ABI we currently have, and it is still experimental, meaning it is not very well optimized or tuned for performance. (e.g. it does not have the limitation of the max number of multivalue-lowered values, which can be detrimental to performance.) We may change the name of this ABI, or improve it, or add a new multivalue ABI in the future. Also I heard that WASI is planning to add their multivalue ABI soon. So the plan is, whenever any one of multivalue ABIs is enabled, we enable the lowering of multivalue returns in the backend. We currently have only 'experimental-mv' in the repo so we only check for that in this PR. Related past discussions: #82714 WebAssembly/tool-conventions#223 (comment)
@aheejin What is the current status of this PR? |
@sunfishcode We ended up doing llvm/llvm-project#88492 instead. Will close this. |
Currently our LLVM Wasm backend returns 128-bit values as two
i64
s in case multivalue is enabled:https://github.com/llvm/llvm-project/blob/a5f576e5961ecc099bd7ccf8565da090edc84b0d/llvm/lib/Target/WebAssembly/WebAssemblyRuntimeLibcallSignatures.cpp#L697-L700
But given that neither emscripten nor wasm-sdk seem to provide a multivalue version of compiler-rt, it looks this has not been working so far and the reason we haven't heard complaints was likely that no one was using compiler-rt with multivalue enabled.
Maintaining and providing two different versions of compiler-rt is a cumbersome thing for toolchains, and emscripten already has to provide multiple versions of many libraries (e.g. threaded vs. non-threaded, debug vs. release, exception-enabled vs. disabled, ...).
Also enabling the multivalue return on several compiler-rt functions that have a 128-bit return value wouldn't affect performance in a meaningful way, given that there are not many of them.
I had a chance to chat with several people who contribute here offline this morning, and it looked we agreed that there is not much benefit to enabling multivalue return in compiler-rt functions. One thing I'm not sure is whether we decided to disable 128-bit multivalue returns for only compiler-rt functions or for all user functions. This PR currently says we do that only for compiler-rt; please let me know if you think we should do otherwise.