-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate wasm-c-api on big-endian hosts #180
Comments
The host sees Wasm's raw array of bytes, through the We could add a family of read/write instructions that adjust for endianness, as suggested in #164. On the other hand, such functionality is mostly independent of this API. For example, you could readily use the endianness-aware types from the boost library to handle this. |
The thing is wabt currently uses "big-endian memory" to improve performance on some older platforms. Instead of storing memory as, say,
as it would on a little-endian host, it stores it as
in other words the entire memory array is byteswapped. This avoids individual byteswaps in loads and stores, at the cost of making memory grows slightly more expensive. Aside from this, memory addressing is as efficient as it's always been: it's just relative addressing after all. It's just that instead of reading an u32 from The cost of byte-swapping an u32 on something like a Motorola 68000 is... a lot. both in code size and performance. and the worst part is that it cannot be optimized away by the compiler: the compiler must always do the byteswaps when reading and writing to memory. Meanwhile, with the relative but backwards addressing trick, we can even use It's worth noting that host code would have to be adapted for the big-endian host either way, since wasm itself is little-endian. There just happen to be 2 ways of doing said adapting. |
So it stores the entire memory in reverse. Interesting. I can see why, but that clearly breaks the memory API, regardless of whether we add auxiliary functions. So I am not really sure what could be done about that? |
The current
(It's also possible to spec both, with a |
As far as I'm concerned, it's 1 right now. The API merely gives you a pointer to the memory. That contains a sequence of random bytes. There is no particular interpretation inherent in these bytes. The Wasm code is likely to make certain assumptions, and the host code better matches those. But that's purely a contract between the two, unrelated to the API (and endianness is only a small part of it). If the host reads/writes multiple bytes with the intention of them representing numbers, it is its responsibility to do so in a manner that matches that contract (which is likely based on LE). Something like the boost library can help with making respective accesses agnostic to native host endianness. As for 2, reinterpreting the memory pointer in a way that requires negative indexing would technically be possible, I suppose, but it doesn't seem very natural or desirable to me. It's like leaking an implementation detail of the engine. Also note that it breaks the use of array indexing into the memory, at least on a 32-bit architecture, where offsets need to be unsigned ints, but the use of |
nah we use either way, the thing is, there is a technical benefit to doing it this way, and it'd be nice if the API could spec it. while at it, having a "high level" memory API, where the consumer asks for bytes/u32's/etc at offsets and letting the engine handle the reads and writes, (e.g. |
can we have a standardized |
You'll have to be a bit more precise. ;) |
defined like so: on big endian platforms, when compiled with for example, wabt would require this mode to be used, because this is how wabt implements linear memory for performance. we personally don't feel like wasm-c-api should hide platform-specific issues (like endianess), but instead figure out the optimal way to handle those within the API. wasm-c-api's current approach with linear memory (linear memory must be represented in little-endian order at all times) is in some cases detrimental to performance. |
If I understand your suggestion correctly, then it appears to be outside the scope of the current C API. It's not a flag we can simply introduce in the API, it implies a different mode of operation and code generation for engines. Nor can users just toggle it in the API, it may require recompiling the engine itself in a different configuration. And I'd expect that implementing all the new codegen in jits would be a substantial undertaking for most engines, with non-trivial implications. If you want to ask all engines to invest in that, then I encourage you to present this to the CG as an actual proposal. |
we want the engine to require a specific compile-time configuration, but an engine isn't required to support both modes. in fact most engines should only support one mode. wasm-c-api consumers should support both modes if they care about big-endian support. this is the only thing blocking WebAssembly/wabt#2161 (but anyway, how would we present this to the CG as an actual proposal?) |
See also #156, #164, WebAssembly/wabt#1972 and WebAssembly/wabt#2161.
Yes, linear memory as seen from wasm is always little-endian. But how should the host see it?
wasm-c-api
currently assumes the host is little-endian too.The text was updated successfully, but these errors were encountered: