-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for vectorized algorithms on x86 #4381
Conversation
This appears to work great and aside from a few questions and comments it looks like a good basis to me. It would be great if we could resolve the remaining loose ends and get it merged so the other outstanding changes could be rebased to use it. |
Actually clang provides cpuid.h also, but their version is missing
I did similar research on kernel tree and it seems that this is simplest solution that works for all relevant kernel versions. |
Yes, that should be fine. And after giving using the If you can squash these patches and force update the PR these should be ready to merge. |
Done. |
@ironMann yes, I'd forgotten about the One lingering concern I have is that we may want to prefix all these macro and functions with a short string. The names a currently very generic and I'm worried that we may end up with a namespace collision at some point in the future. |
return (boot_cpu_has(X86_FEATURE_AVX) && | ||
boot_cpu_has(X86_FEATURE_OSXSAVE)); | ||
#else | ||
return (cpuid_has_avx()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to check osxsave for userspace, and you need to check if xsave saves avx registers using xgetbv.
tuxoko@8baa4a5#diff-0d2e10cd21fcf823e8e9a62a934e5115R48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, I'll add a userspace check for that and more inst. sets (bmi, bmi2...)
The kernel_fpu_begin can be easily work around |
This is initial support for x86 vectorized implementations of ZFS parity and checksum algorithms. For the compilation phase, configure step checks if toolchain supports relevant instruction sets. Each implementation must ensure that the code is not passed to compiler if relevant instruction set is not supported. For this purpose, following new defines are provided if instruction set is supported: - HAVE_SSE, - HAVE_SSE2, - HAVE_SSE3, - HAVE_SSSE3, - HAVE_SSE4_1, - HAVE_SSE4_2, - HAVE_AVX, - HAVE_AVX2. For detecting if an instruction set can be used in runtime, following functions are provided in (include/linux/simd_x86.h): - zfs_sse_available() - zfs_sse2_available() - zfs_sse3_available() - zfs_ssse3_available() - zfs_sse4_1_available() - zfs_sse4_2_available() - zfs_avx_available() - zfs_avx2_available() - zfs_bmi1_available() - zfs_bmi2_available() These function should be called once, on module load, or initialization. They are safe to use from user and kernel space. If an implementation is using more than single instruction set, both compiler and runtime support for all relevant instruction sets should be checked. Kernel fpu methods: - kfpu_begin() - kfpu_end() Use __get_cpuid_max and __cpuid_count from <cpuid.h> Both gcc and clang have support for these. They also handle ebx register in case it is used for PIC code.
41b7279
to
6c36357
Compare
@behlendorf, @tuxoko can you take another look Added:
|
@ironMann the updated patch LGTM. However, before it can be merged I think we should update one or more of the proposed patches which would depended on this infrastructure to make sure it provides everything we need (at least initially). The fletcher patch in #4330 is probably the simplest to update. |
@behlendorf sure, I'll take a look at #4330. |
LGTM |
@behlendorf #4328 rebased and pushed |
This is initial support for x86 vectorized implementations of ZFS parity and checksum algorithms. For the compilation phase, configure step checks if toolchain supports relevant instruction sets. Each implementation must ensure that the code is not passed to compiler if relevant instruction set is not supported. For this purpose, following new defines are provided if instruction set is supported: - HAVE_SSE, - HAVE_SSE2, - HAVE_SSE3, - HAVE_SSSE3, - HAVE_SSE4_1, - HAVE_SSE4_2, - HAVE_AVX, - HAVE_AVX2. For detecting if an instruction set can be used in runtime, following functions are provided in (include/linux/simd_x86.h): - zfs_sse_available() - zfs_sse2_available() - zfs_sse3_available() - zfs_ssse3_available() - zfs_sse4_1_available() - zfs_sse4_2_available() - zfs_avx_available() - zfs_avx2_available() - zfs_bmi1_available() - zfs_bmi2_available() These function should be called once, on module load, or initialization. They are safe to use from user and kernel space. If an implementation is using more than single instruction set, both compiler and runtime support for all relevant instruction sets should be checked. Kernel fpu methods: - kfpu_begin() - kfpu_end() Use __get_cpuid_max and __cpuid_count from <cpuid.h> Both gcc and clang have support for these. They also handle ebx register in case it is used for PIC code. Signed-off-by: Gvozden Neskovic <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Closes openzfs#4381 Conflicts: config/kernel.m4
This is initial support for x86 vectorized implementations of ZFS parity
and checksum algorithms.
For the compilation phase, configure step checks if toolchain supports relevant
instruction sets. Each implementation must ensure that the code is not passed
to compiler if relevant instruction set is not supported. For this purpose,
following new defines are provided if instruction set is supported:
- HAVE_SSE,
- HAVE_SSE2,
- HAVE_SSE3,
- HAVE_SSSE3,
- HAVE_SSE4_1,
- HAVE_SSE4_2,
- HAVE_AVX,
- HAVE_AVX2.
For detecting if an instruction set can be used in runtime, following functions
are provided in simd_x86.h:
- sse_available()
- sse2_available()
- sse3_available()
- ssse3_available()
- sse4_1_available()
- sse4_2_available()
- avx_available()
- avx2_available()
These function should be called once, on module load, or initialization.
They are safe to use from user and kernel space.
If an implementation is using more than single instruction set, both compiler
and runtime support for all relevant instruction sets should be checked.
This is relevant for:
#2351 - sha256 avx optimization
#3374 - raidz avx/avx2/sse optimization
#4328 - raidz avl/av2/sse optimization
#4330 - fletcher avl optimzation