-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add some avx512f intrinsics(mask, rotation, shift) #884
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @gnzlbg (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
Can you add |
You should run |
Double-check your
Use |
2020-08-20T13:15:34.0092099Z failed to verify __m512i _mm512_ror_epi32 (__m512i a, int imm8) |
Use an |
Unfortunately rustc currently doesn't support compile-time bound-checking for intrinsic arguments, so we need to use a runtime assert instead. |
Will it be supported in the future? |
There are plans to support it with In any case we currently do runtime bound checking for all the other intrinsics, so it's fine to use it here as well. |
Now you need to fix the |
In my platform Ubuntu 20.04 with gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2) and clang version 10.0.0-4ubuntu1, |
The failing tests are for i686-unknown-linux-gnu, can you try on that target? |
test result: ok. 1109 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Work in my machine. |
The proper command to run the tests is:
The errors happen both with x86_64-unknown-linux-gnu i686-unknown-linux-gnu. I had a quick look, it should just be a matter of adjusting the instructions in |
Thanks for your help. If I use cargo +nightly rustc -- --emit asm in my test file in my machine, it generate fn main() { In the website, it should be |
Keep in mind that a rotate-right can be turned into a rotate-left by adjusting the immediate operand, which the compiler does when it is a constant. So it's fine for the compiler to turn a |
When I compile it with "clang -S -masm=intel test.c -mavx512f" with the _mm512_ror_epi32(a, 1), Does this two line guide how the compiler to generate the code? |
You need to put |
I modify my code to: #[target_feature(enable = "avx512f")] fn main() { After cargo rustc -- --emit asm, "vpslld %xmm0, %zmm1, %zmm2" is generated. However, running TARGET=x86_64-unknown-linux-gnu ci/run.sh shows: ---- core_arch::x86::avx512f::assert__mm512_slli_epi32_vpslld stdout ---- It seems that it generated wrong code? |
A left shift by 1 is the same as multiplying the number by 2. The compiler therefore optimizes it to an |
Yes, after I change from 1 to 5 #[cfg_attr(test, assert_instr(vpslld, imm8 = 5))], it passed. However, __mm512_ror_epi32 still generates "vprold" whatever I change the imm8 value. Also, "Intel C/C++ Compiler Intrinsic Equivalents |
AND is a bitwise operation so VPANDQ and VPANDD are equivalent if no mask is used. This means that the compiler can choose either one. Same with VPROL/VPROR: since on In both of these cases you can just adjust the |
Finally, "mask and". #[inline] ---- core_arch::x86::avx512f::assert__mm512_kand_kandw stdout ---- |
The masks are just integers. Unless they are used with an AVX512 instruction, the compiler has no need to actually place them in a I think it's fine to just let those use normal |
Thanks! |
add some avx512f intrinsics
[
_mm512_and_epi32
][
_mm512_and_epi64
][
_mm512_and_si512
][
_mm512_kand
][
_mm512_kor
][
_mm512_kxor
][
_kand_mask16
][
_kor_mask16
][
_kxor_mask16
][
_mm512_mask_and_epi32
][
_mm512_mask_and_epi64
][
_mm512_mask_or_epi32
][
_mm512_mask_or_epi64
][
_mm512_mask_rol_epi32
][
_mm512_mask_rol_epi64
][
_mm512_mask_rolv_epi32
][
_mm512_mask_rolv_epi64
][
_mm512_mask_ror_epi32
][
_mm512_mask_ror_epi64
][
_mm512_mask_rorv_epi32
][
_mm512_mask_rorv_epi64
][
_mm512_mask_sll_epi32
][
_mm512_mask_sll_epi64
][
_mm512_mask_slli_epi32
][
_mm512_mask_slli_epi64
][
_mm512_mask_sllv_epi32
][
_mm512_mask_sllv_epi64
][
_mm512_mask_sra_epi32
][
_mm512_mask_sra_epi64
][
_mm512_mask_srai_epi32
][
_mm512_mask_srai_epi64
][
_mm512_mask_srav_epi32
][
_mm512_mask_srav_epi64
][
_mm512_mask_srl_epi32
][
_mm512_mask_srl_epi64
][
_mm512_mask_srli_epi32
][
_mm512_mask_srli_epi64
][
_mm512_mask_srlv_epi32
][
_mm512_mask_srlv_epi64
][
_mm512_mask_xor_epi32
][
_mm512_mask_xor_epi64
][
_mm512_maskz_and_epi32
][
_mm512_maskz_and_epi64
][
_mm512_maskz_or_epi32
][
_mm512_maskz_or_epi64
][
_mm512_maskz_rol_epi32
][
_mm512_maskz_rol_epi64
][
_mm512_maskz_rolv_epi32
][
_mm512_maskz_rolv_epi64
][
_mm512_maskz_ror_epi32
][
_mm512_maskz_ror_epi64
][
_mm512_maskz_rorv_epi32
][
_mm512_maskz_rorv_epi64
][
_mm512_maskz_sll_epi32
][
_mm512_maskz_sll_epi64
][
_mm512_maskz_slli_epi32
][
_mm512_maskz_slli_epi64
][
_mm512_maskz_sllv_epi32
][
_mm512_maskz_sllv_epi64
][
_mm512_maskz_sra_epi32
][
_mm512_maskz_sra_epi64
][
_mm512_maskz_srai_epi32
][
_mm512_maskz_srai_epi64
][
_mm512_maskz_srav_epi32
][
_mm512_maskz_srav_epi64
][
_mm512_maskz_srl_epi32
][
_mm512_maskz_srl_epi64
][
_mm512_maskz_srli_epi32
][
_mm512_maskz_srli_epi64
][
_mm512_maskz_srlv_epi32
][
_mm512_maskz_srlv_epi64
][
_mm512_maskz_xor_epi32
][
_mm512_maskz_xor_epi64
][
_mm512_or_epi32
][
_mm512_or_epi64
][
_mm512_or_si512
][
_mm512_rol_epi32
][
_mm512_rol_epi64
][
_mm512_rolv_epi32
][
_mm512_rolv_epi64
][
_mm512_ror_epi32
][
_mm512_ror_epi64
][
_mm512_rorv_epi32
][
_mm512_rorv_epi64
][
_mm512_sll_epi32
][
_mm512_sll_epi64
][
_mm512_slli_epi32
][
_mm512_slli_epi64
][
_mm512_sllv_epi32
][
_mm512_sllv_epi64
][
_mm512_sra_epi32
][
_mm512_sra_epi64
][
_mm512_srai_epi32
][
_mm512_srai_epi64
][
_mm512_srav_epi32
][
_mm512_srav_epi64
][
_mm512_srl_epi32
][
_mm512_srl_epi64
][
_mm512_srli_epi32
][
_mm512_srli_epi64
][
_mm512_srlv_epi32
][
_mm512_srlv_epi64
][
_mm512_xor_epi32
][
_mm512_xor_epi64
][
_mm512_xor_si512
]