[test] Compile test for vectorized fletcher with #4381 #4427

ironMann · 2016-03-17T00:58:49Z

Adaptation of @jxiong's : vectorized_fletcher branch to #4381

Detect if the running CPU supports AVX instruction, and evaluate Fletcher-4 computation througput and choose the fastest one. Signed-off-by: Jinshan Xiong <[email protected]>

This is initial support for x86 vectorized implementations of ZFS parity and checksum algorithms. For the compilation phase, configure step checks if toolchain supports relevant instruction sets. Each implementation must ensure that the code is not passed to compiler if relevant instruction set is not supported. For this purpose, following new defines are provided if instruction set is supported: - HAVE_SSE, - HAVE_SSE2, - HAVE_SSE3, - HAVE_SSSE3, - HAVE_SSE4_1, - HAVE_SSE4_2, - HAVE_AVX, - HAVE_AVX2. For detecting if an instruction set can be used in runtime, following functions are provided in (include/linux/simd_x86.h): - zfs_sse_available() - zfs_sse2_available() - zfs_sse3_available() - zfs_ssse3_available() - zfs_sse4_1_available() - zfs_sse4_2_available() - zfs_avx_available() - zfs_avx2_available() - zfs_bmi1_available() - zfs_bmi2_available() These function should be called once, on module load, or initialization. They are safe to use from user and kernel space. If an implementation is using more than single instruction set, both compiler and runtime support for all relevant instruction sets should be checked. Kernel fpu methods: - kfpu_begin() - kfpu_end() Use __get_cpuid_max and __cpuid_count from <cpuid.h> Both gcc and clang have support for these. They also handle ebx register in case it is used for PIC code.

openzfs#4381

behlendorf · 2016-03-17T19:26:54Z

@ironMann thanks. It looks like it was fairly straight forward to update. We'll want to iterate with @jxiong on the fletcher patch itself to enhance it some. For example integrate it with ztest, probably put the benchmark results in a kstat rather than the console, and put the unit test somewhere it gets regularly run. But it looks like it integrated nicely with your vectorized algorithms patch.

That said I think we do have a problem with it because when running in the ec2 test instance the zfs_avx_available() check failed. Notice there are only generic results in the console output.

Based on the aws documentation we should have been running on an Intel Xeon E5-2670 v2* which according to the Intel documentation supports avx. Logging in the instance appears to confirm that yet we didn't enable axv support.

Anybody know why this might be the case? @ironMann presumably it works as expected on your test system?

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 62
model name  : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping    : 4
microcode   : 0x416
cpu MHz     : 2500.060
cache size  : 25600 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 1
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt
bugs        :
bogomips    : 5000.12
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

behlendorf · 2016-03-17T19:32:52Z

Whoops. Sorry about closing that, I mis-clicked. Reopened. It looks like I posted a little too quickly, it looks like zfs_avx2_available() failed which makes sense since it isn't supported. So it did work properly. Although it's a bit unfortunate the test coverage therefore doesn't cover this.

ironMann · 2016-03-17T19:41:32Z

@behlendorf Exactly, generally when testing for AVX2 it's recommended to also check for AVX since they introduced the wider ymm register set. AVX2 is only available since Xeons E5 v3 (Haswell). AFAIK, aws offers them in some instances but I have not seen a buildbot deployed on such machine.

ironMann · 2016-03-17T20:00:38Z

Here are benchmark result from my system Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz:

[  330.268017] NOTICE: fletcher-4: generic  33542 MB/s
[  330.285012] NOTICE: fletcher-4: avx2     102855 MB/s
[  330.361481] ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5

behlendorf · 2016-03-17T20:29:50Z

It appears the ec2 d2 instance types support both avx2 and instance storage (which we use). I'll see about adding one to the automated testing mix. If we're going to be adding these kind of optimizations it's going to need to be covered by the automated testing.

ironMann · 2016-03-24T16:04:57Z

Closing since #4330 was updated to use #4381

Jinshan Xiong and others added 3 commits February 25, 2016 12:21

compute fletcher 4 with avx instructions

151633e

Detect if the running CPU supports AVX instruction, and evaluate Fletcher-4 computation througput and choose the fastest one. Signed-off-by: Jinshan Xiong <[email protected]>

Adapt jxiong:vectorized_fletcher (openzfs#4330) to

347f12d

openzfs#4381

ironMann force-pushed the jxiong_vectorized_fletcher branch 3 times, most recently from 807d83b to 347f12d Compare March 17, 2016 09:05

behlendorf closed this Mar 17, 2016

behlendorf reopened this Mar 17, 2016

behlendorf mentioned this pull request Mar 21, 2016

compute fletcher 4 with avx instructions #4330

Closed

ironMann closed this Mar 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test] Compile test for vectorized fletcher with #4381 #4427

[test] Compile test for vectorized fletcher with #4381 #4427

ironMann commented Mar 17, 2016

behlendorf commented Mar 17, 2016

behlendorf commented Mar 17, 2016

ironMann commented Mar 17, 2016

ironMann commented Mar 17, 2016

behlendorf commented Mar 17, 2016

ironMann commented Mar 24, 2016

[test] Compile test for vectorized fletcher with #4381 #4427

[test] Compile test for vectorized fletcher with #4381 #4427

Conversation

ironMann commented Mar 17, 2016

behlendorf commented Mar 17, 2016

behlendorf commented Mar 17, 2016

ironMann commented Mar 17, 2016

ironMann commented Mar 17, 2016

behlendorf commented Mar 17, 2016

ironMann commented Mar 24, 2016