Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] Compile test for vectorized fletcher with #4381 #4427

Closed
wants to merge 3 commits into from

Conversation

ironMann
Copy link
Contributor

Adaptation of @jxiong's : vectorized_fletcher branch to #4381

Jinshan Xiong and others added 3 commits February 25, 2016 12:21
Detect if the running CPU supports AVX instruction, and evaluate
Fletcher-4 computation througput and choose the fastest one.

Signed-off-by: Jinshan Xiong <[email protected]>
This is initial support for x86 vectorized implementations of ZFS parity
and checksum algorithms.

For the compilation phase, configure step checks if toolchain supports relevant
instruction sets. Each implementation must ensure that the code is not passed
to compiler if relevant instruction set is not supported. For this purpose,
following new defines are provided if instruction set is supported:
	- HAVE_SSE,
	- HAVE_SSE2,
	- HAVE_SSE3,
	- HAVE_SSSE3,
	- HAVE_SSE4_1,
	- HAVE_SSE4_2,
	- HAVE_AVX,
	- HAVE_AVX2.

For detecting if an instruction set can be used in runtime, following functions
are provided in (include/linux/simd_x86.h):
	- zfs_sse_available()
	- zfs_sse2_available()
	- zfs_sse3_available()
	- zfs_ssse3_available()
	- zfs_sse4_1_available()
	- zfs_sse4_2_available()
	- zfs_avx_available()
	- zfs_avx2_available()
	- zfs_bmi1_available()
	- zfs_bmi2_available()

These function should be called once, on module load, or initialization.
They are safe to use from user and kernel space.
If an implementation is using more than single instruction set, both compiler
and runtime support for all relevant instruction sets should be checked.

Kernel fpu methods:
	- kfpu_begin()
	- kfpu_end()

Use __get_cpuid_max and __cpuid_count from <cpuid.h>
Both gcc and clang have support for these. They also handle ebx register
in case it is used for PIC code.
@ironMann ironMann force-pushed the jxiong_vectorized_fletcher branch 3 times, most recently from 807d83b to 347f12d Compare March 17, 2016 09:05
@behlendorf
Copy link
Contributor

@ironMann thanks. It looks like it was fairly straight forward to update. We'll want to iterate with @jxiong on the fletcher patch itself to enhance it some. For example integrate it with ztest, probably put the benchmark results in a kstat rather than the console, and put the unit test somewhere it gets regularly run. But it looks like it integrated nicely with your vectorized algorithms patch.

That said I think we do have a problem with it because when running in the ec2 test instance the zfs_avx_available() check failed. Notice there are only generic results in the console output.

Based on the aws documentation we should have been running on an Intel Xeon E5-2670 v2* which according to the Intel documentation supports avx. Logging in the instance appears to confirm that yet we didn't enable axv support.

Anybody know why this might be the case? @ironMann presumably it works as expected on your test system?

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 62
model name  : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping    : 4
microcode   : 0x416
cpu MHz     : 2500.060
cache size  : 25600 KB
physical id : 0
siblings    : 2
core id     : 0
cpu cores   : 1
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt
bugs        :
bogomips    : 5000.12
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

@behlendorf behlendorf closed this Mar 17, 2016
@behlendorf behlendorf reopened this Mar 17, 2016
@behlendorf
Copy link
Contributor

Whoops. Sorry about closing that, I mis-clicked. Reopened. It looks like I posted a little too quickly, it looks like zfs_avx2_available() failed which makes sense since it isn't supported. So it did work properly. Although it's a bit unfortunate the test coverage therefore doesn't cover this.

@ironMann
Copy link
Contributor Author

@behlendorf Exactly, generally when testing for AVX2 it's recommended to also check for AVX since they introduced the wider ymm register set. AVX2 is only available since Xeons E5 v3 (Haswell). AFAIK, aws offers them in some instances but I have not seen a buildbot deployed on such machine.

@ironMann
Copy link
Contributor Author

Here are benchmark result from my system Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz:

[  330.268017] NOTICE: fletcher-4: generic  33542 MB/s
[  330.285012] NOTICE: fletcher-4: avx2     102855 MB/s
[  330.361481] ZFS: Loaded module v0.6.5-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5

@behlendorf
Copy link
Contributor

It appears the ec2 d2 instance types support both avx2 and instance storage (which we use). I'll see about adding one to the automated testing mix. If we're going to be adding these kind of optimizations it's going to need to be covered by the automated testing.

@ironMann
Copy link
Contributor Author

Closing since #4330 was updated to use #4381

@ironMann ironMann closed this Mar 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants