-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-of-bounds behaviour #35
Comments
If we don't care that much about determinism and/or browser fingerprinting, then make the out-of-bound undefined would be the most performant. Otherwise, I think wrap-around (ie: modulo the length) is the best tradeoff as it is super fast to implement in all architecture (just a bitwise and). Concerning shifts, I proposed in #27 a more general shift that takes 2 inputs (https://github.com/lemaitre/flexible-vectors/blob/master/proposals/flexible-vectors/README.md#lane-shift). It would be somewhat equivalent to splice, but with integer. |
Wrap-around is not always a bitwise-and: all of RISC-V V, Arm SVE, and SimpleV support non-power-of-2 vector lengths. |
That's true that those ISAs support non-power of 2 vector lengths, but the architectural vector length is, AFAICT, always a power of 2, so restricting the native vector length of wasm registers to a power of 2 is a sensible choice that would make my assertion valid. |
Last I checked, it is a valid implementation of RISC-V V (and also SVE) where the maximum vector length is not a power of 2.
|
That's not true for SVE - the architecture permits implementations whose vector length is 384 bits, for example, so proper vector length-agnostic (VLA) code generation can't make any assumptions about the vector length other than its being a multiple of 128 up to 2048 bits inclusive. In that case wrap-around would require the following instruction sequence, taking 16-bit elements as an example (index in
Note that there is no instruction to calculate the remainder of a division. However, it is true that one option is to force all Wasm implementations to constrain the vector length to the largest power of 2 that is less than or equal to the hardware vector length (an ability that is a requirement of the architecture). It's awkward and potentially a waste of hardware resources, but possible. P.S. Changing the vector length in SVE is risky because a scalable vector register could be saved somewhere on the stack by a function up the call chain. That's why in practice probably nobody is going to do it, unless starting a new process. However, it could work in a strictly controlled environments such as Wasm runtimes - they tend to do weird stuff anyway (e.g. the way linear memory bounds checking is implemented). |
Good point - this needs to be stated in the spec. With operations that select lanes, I second the option of making out-of-bounds either platform-specific (see WebAssembly/relaxed-simd#22), which is a softer form of "undefined", or doing some form of truncation. Wasm SIMD tried assigning special meaning for out-of-bounds indices in swizzle, and it does not scale on x86-based platforms (WebAssembly/simd#93). |
@penzn While you are at it, the specification text relating to the lane-wise shifts mentions two input vectors I just remembered that there was an alternative way to enforce a power-of-2 vector length in a SVE-based implementation - so far my assumption has been that the governing predicates that would be used by the generated instructions would be initialized with |
Hello,
I am part of a team at Arm that works on various WebAssembly runtimes. We have been reviewing the Wasm proposals, focusing on how they map to the Arm architecture (in particular the 64-bit execution state of the A profile). As a result, we have a couple of questions about the out-of-bounds behaviour of some operations in the flexible vectors proposal:
extract_lane
andreplace_lane
operations for out-of-bounds indices?Of course, these questions should be answered explicitly by the specification text.
The second question is probably a bit more important because the shift amounts would be unknown at compile time in the general case (they are not immediates). To give a concrete example, here's how the
vec.i64.lshl
operation could be expressed with Scalable Vector Extension (SVE) instructions:This sequence uses vector length-agnostic code generation, which could be the approach chosen by an ahead-of-time Wasm compiler - probably not the expected situation, but IMHO it demonstrates the worst case scenario. Inputs are in
Z0
andW0
respectively, while the output is inZ1
. Coming back to the second question, the assumptions are that the shift amount is unsigned and that the result vector must be filled with zeros if the shift amount is larger than the vector length (let's ignore the possibility of setting the vector length dynamically for a moment to make the discussion simpler - it's a third tier operation anyway).The equivalent Neon mapping could be:
If the shift amount is actually known statically, it is possible to simplify - e.g. to shift by 1:
A general remark - explicit indices are awkward from the point of view of SVE because the instructions usually rely on predicates (i.e. masks) for the same functionality. This increases compiler complexity for sequences like an index-generating operation and
extract_lane
, which could result in performance issues due to the round trip between predicate and general-purpose registers.The text was updated successfully, but these errors were encountered: