-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[test] Compile test for vectorized fletcher with #4381 #4427
Conversation
Detect if the running CPU supports AVX instruction, and evaluate Fletcher-4 computation througput and choose the fastest one. Signed-off-by: Jinshan Xiong <[email protected]>
This is initial support for x86 vectorized implementations of ZFS parity and checksum algorithms. For the compilation phase, configure step checks if toolchain supports relevant instruction sets. Each implementation must ensure that the code is not passed to compiler if relevant instruction set is not supported. For this purpose, following new defines are provided if instruction set is supported: - HAVE_SSE, - HAVE_SSE2, - HAVE_SSE3, - HAVE_SSSE3, - HAVE_SSE4_1, - HAVE_SSE4_2, - HAVE_AVX, - HAVE_AVX2. For detecting if an instruction set can be used in runtime, following functions are provided in (include/linux/simd_x86.h): - zfs_sse_available() - zfs_sse2_available() - zfs_sse3_available() - zfs_ssse3_available() - zfs_sse4_1_available() - zfs_sse4_2_available() - zfs_avx_available() - zfs_avx2_available() - zfs_bmi1_available() - zfs_bmi2_available() These function should be called once, on module load, or initialization. They are safe to use from user and kernel space. If an implementation is using more than single instruction set, both compiler and runtime support for all relevant instruction sets should be checked. Kernel fpu methods: - kfpu_begin() - kfpu_end() Use __get_cpuid_max and __cpuid_count from <cpuid.h> Both gcc and clang have support for these. They also handle ebx register in case it is used for PIC code.
807d83b
to
347f12d
Compare
@ironMann thanks. It looks like it was fairly straight forward to update. We'll want to iterate with @jxiong on the fletcher patch itself to enhance it some. For example integrate it with ztest, probably put the benchmark results in a kstat rather than the console, and put the unit test somewhere it gets regularly run. But it looks like it integrated nicely with your vectorized algorithms patch. That said I think we do have a problem with it because when running in the ec2 test instance the Based on the aws documentation we should have been running on an Intel Xeon E5-2670 v2* which according to the Intel documentation supports avx. Logging in the instance appears to confirm that yet we didn't enable axv support. Anybody know why this might be the case? @ironMann presumably it works as expected on your test system?
|
Whoops. Sorry about closing that, I mis-clicked. Reopened. It looks like I posted a little too quickly, it looks like |
@behlendorf Exactly, generally when testing for AVX2 it's recommended to also check for AVX since they introduced the wider |
Here are benchmark result from my system
|
It appears the ec2 d2 instance types support both avx2 and instance storage (which we use). I'll see about adding one to the automated testing mix. If we're going to be adding these kind of optimizations it's going to need to be covered by the automated testing. |
Adaptation of @jxiong's : vectorized_fletcher branch to #4381