Skip to content
This repository was archived by the owner on Jan 26, 2022. It is now read-only.

unrestricted two-input shuffles #74

Closed
sunfishcode opened this issue Sep 30, 2014 · 5 comments
Closed

unrestricted two-input shuffles #74

sunfishcode opened this issue Sep 30, 2014 · 5 comments

Comments

@sunfishcode
Copy link
Member

The shuffleMix function takes two inputs, but is restricted in which elements can be shuffled where. In few of the examples like LLVM's __builtin_shufflevector and GCC's __builtin_shuffle, which are both considered target-independent interfaces, it is desirable that SIMD.js should provide a similar unrestricted shuffle operation.

I propose:

  • This new unrestricted shuffle should be named "shuffle".
  • The existing shuffle function should be renamed to "swizzle"

With 4096 possible masks for two-input shuffles with 4-element vectors, it's unappealing to pre-declare names for all possible shuffle masks, so I propose we just have the functions just accept 4 additional arguments to specify the lanes of the mask, which the significant caveat that if the values aren't constants, the code isn't going to go fast (this problem is already present, because nothing currently prevents non-constant masks from being passed into the shuffleMix function) ("not going to go fast" means the input vectors are manually stored to the stack and the elements are loaded out one at a time).

The other interesting question here is implementation burden. Computing shuffle masks on various platforms can be tricky, but I'm hoping we can collect some guidance for how this can be done for various types and for various platforms.

For example, float32x4 shuffle on x86 can always be done in at most two shufps instructions (plus copies as needed), and it's straightforward to recognize cases which can use a single shufps, movhlps, movlhps, unpckhps, unpcklps, or reg-reg movss, some of which are quite common. I'm less familiar with ARM, and unfortunately it looks more complex. That said, the existing shuffleMix already looks fairly complex on ARM.

sunfishcode added a commit to sunfishcode/ecmascript_simd that referenced this issue Oct 6, 2014
sunfishcode added a commit to sunfishcode/ecmascript_simd that referenced this issue Oct 6, 2014
sunfishcode added a commit to sunfishcode/ecmascript_simd that referenced this issue Oct 6, 2014
@sunfishcode
Copy link
Member Author

I now have a prototype implementation of this for the polyfill:
ceb4f9d and
23ffdc2. This includes updates to the benchmark code, so you can see what the proposed API looks like in practice.

@huningxin
Copy link
Contributor

I like this proposal. One question, what's the right behavior if the shuffle selector is invalid, say out of range?

@ghost
Copy link

ghost commented Oct 9, 2014

With a naive VTBL implementation ARM would write zeroes for out-of-range indices. They could also be taken as don't-care (which allows the compiler to find something more efficient, but gives implementation-defined results), or the indices could be defined as being bit-masked to a legal range.

@sunfishcode
Copy link
Member Author

If we end up with the API where shuffle indices are indices into the concatenation of the two input vectors, I'd support bit-masking the indices to a legal range.

sunfishcode added a commit to sunfishcode/ecmascript_simd that referenced this issue Oct 22, 2014
sunfishcode added a commit to sunfishcode/ecmascript_simd that referenced this issue Oct 28, 2014
sunfishcode added a commit to sunfishcode/ecmascript_simd that referenced this issue Nov 4, 2014
sunfishcode added a commit to sunfishcode/ecmascript_simd that referenced this issue Nov 5, 2014
@sunfishcode
Copy link
Member Author

Unrestricted two-input "New-style" shuffles were implemented in #83.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants