This repository was archived by the owner on Jan 26, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 64
unrestricted two-input shuffles #74
Comments
Closed
sunfishcode
added a commit
to sunfishcode/ecmascript_simd
that referenced
this issue
Oct 6, 2014
sunfishcode
added a commit
to sunfishcode/ecmascript_simd
that referenced
this issue
Oct 6, 2014
sunfishcode
added a commit
to sunfishcode/ecmascript_simd
that referenced
this issue
Oct 6, 2014
I like this proposal. One question, what's the right behavior if the shuffle selector is invalid, say out of range? |
With a naive VTBL implementation ARM would write zeroes for out-of-range indices. They could also be taken as don't-care (which allows the compiler to find something more efficient, but gives implementation-defined results), or the indices could be defined as being bit-masked to a legal range. |
If we end up with the API where shuffle indices are indices into the concatenation of the two input vectors, I'd support bit-masking the indices to a legal range. |
sunfishcode
added a commit
to sunfishcode/ecmascript_simd
that referenced
this issue
Oct 22, 2014
Merged
sunfishcode
added a commit
to sunfishcode/ecmascript_simd
that referenced
this issue
Oct 28, 2014
sunfishcode
added a commit
to sunfishcode/ecmascript_simd
that referenced
this issue
Nov 4, 2014
sunfishcode
added a commit
to sunfishcode/ecmascript_simd
that referenced
this issue
Nov 5, 2014
Unrestricted two-input "New-style" shuffles were implemented in #83. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The shuffleMix function takes two inputs, but is restricted in which elements can be shuffled where. In few of the examples like LLVM's __builtin_shufflevector and GCC's __builtin_shuffle, which are both considered target-independent interfaces, it is desirable that SIMD.js should provide a similar unrestricted shuffle operation.
I propose:
With 4096 possible masks for two-input shuffles with 4-element vectors, it's unappealing to pre-declare names for all possible shuffle masks, so I propose we just have the functions just accept 4 additional arguments to specify the lanes of the mask, which the significant caveat that if the values aren't constants, the code isn't going to go fast (this problem is already present, because nothing currently prevents non-constant masks from being passed into the shuffleMix function) ("not going to go fast" means the input vectors are manually stored to the stack and the elements are loaded out one at a time).
The other interesting question here is implementation burden. Computing shuffle masks on various platforms can be tricky, but I'm hoping we can collect some guidance for how this can be done for various types and for various platforms.
For example, float32x4 shuffle on x86 can always be done in at most two shufps instructions (plus copies as needed), and it's straightforward to recognize cases which can use a single shufps, movhlps, movlhps, unpckhps, unpcklps, or reg-reg movss, some of which are quite common. I'm less familiar with ARM, and unfortunately it looks more complex. That said, the existing shuffleMix already looks fairly complex on ARM.
The text was updated successfully, but these errors were encountered: