Skip to content

Commit

Permalink
Support for vectorized algorithms on x86
Browse files Browse the repository at this point in the history
This is initial support for x86 vectorized implementations of ZFS parity
and checksum algorithms.

For the compilation phase, configure step checks if toolchain supports relevant
instruction sets. Each implementation must ensure that the code is not passed
to compiler if relevant instruction set is not supported. For this purpose,
following new defines are provided if instruction set is supported:
	- HAVE_SSE,
	- HAVE_SSE2,
	- HAVE_SSE3,
	- HAVE_SSSE3,
	- HAVE_SSE4_1,
	- HAVE_SSE4_2,
	- HAVE_AVX,
	- HAVE_AVX2.

For detecting if an instruction set can be used in runtime, following functions
are provided in (include/linux/simd_x86.h):
	- zfs_sse_available()
	- zfs_sse2_available()
	- zfs_sse3_available()
	- zfs_ssse3_available()
	- zfs_sse4_1_available()
	- zfs_sse4_2_available()
	- zfs_avx_available()
	- zfs_avx2_available()
	- zfs_bmi1_available()
	- zfs_bmi2_available()

These function should be called once, on module load, or initialization.
They are safe to use from user and kernel space.
If an implementation is using more than single instruction set, both compiler
and runtime support for all relevant instruction sets should be checked.

Kernel fpu methods:
	- kfpu_begin()
	- kfpu_end()

Use __get_cpuid_max and __cpuid_count from <cpuid.h>
Both gcc and clang have support for these. They also handle ebx register
in case it is used for PIC code.

Signed-off-by: Gvozden Neskovic <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #4381
  • Loading branch information
ironMann authored and behlendorf committed Mar 21, 2016
1 parent e853ba3 commit fc0c72b
Show file tree
Hide file tree
Showing 6 changed files with 583 additions and 2 deletions.
18 changes: 18 additions & 0 deletions config/kernel-fpu.m4
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
dnl #
dnl # 4.2 API change
dnl # asm/i387.h is replaced by asm/fpu/api.h
dnl #
AC_DEFUN([ZFS_AC_KERNEL_FPU], [
AC_MSG_CHECKING([whether asm/fpu/api.h exists])
ZFS_LINUX_TRY_COMPILE([
#include <linux/kernel.h>
#include <asm/fpu/api.h>
],[
__kernel_fpu_begin();
],[
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_FPU_API_H, 1, [kernel has <asm/fpu/api.h> interface])
],[
AC_MSG_RESULT(no)
])
])
3 changes: 2 additions & 1 deletion config/kernel.m4
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
dnl #
dnl # Default ZFS kernel configuration
dnl # Default ZFS kernel configuration
dnl #
AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL
Expand Down Expand Up @@ -91,6 +91,7 @@ AC_DEFUN([ZFS_AC_CONFIG_KERNEL], [
ZFS_AC_KERNEL_FOLLOW_DOWN_ONE
ZFS_AC_KERNEL_MAKE_REQUEST_FN
ZFS_AC_KERNEL_GENERIC_IO_ACCT
ZFS_AC_KERNEL_FPU
AS_IF([test "$LINUX_OBJ" != "$LINUX"], [
KERNELMAKE_PARAMS="$KERNELMAKE_PARAMS O=$LINUX_OBJ"
Expand Down
172 changes: 172 additions & 0 deletions config/toolchain-simd.m4
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
dnl #
dnl # Checks if host toolchain supports SIMD instructions
dnl #
AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_TOOLCHAIN_SIMD], [
case "$host_cpu" in
x86_64 | x86 | i686)
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE2
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE3
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSSE3
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_1
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_2
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX2
;;
esac
])

dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE], [
AC_MSG_CHECKING([whether host toolchain supports SSE])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("xorps %xmm0, %xmm1");
}
]])], [
AC_DEFINE([HAVE_SSE], 1, [Define if host toolchain supports SSE])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])

dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE2
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE2], [
AC_MSG_CHECKING([whether host toolchain supports SSE2])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("pxor %xmm0, %xmm1");
}
]])], [
AC_DEFINE([HAVE_SSE2], 1, [Define if host toolchain supports SSE2])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])

dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE3
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE3], [
AC_MSG_CHECKING([whether host toolchain supports SSE3])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
char v[16];
__asm__ __volatile__("lddqu %0,%%xmm0" :: "m"(v[0]));
}
]])], [
AC_DEFINE([HAVE_SSE3], 1, [Define if host toolchain supports SSE3])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])

dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSSE3
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSSE3], [
AC_MSG_CHECKING([whether host toolchain supports SSSE3])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("pshufb %xmm0,%xmm1");
}
]])], [
AC_DEFINE([HAVE_SSSE3], 1, [Define if host toolchain supports SSSE3])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])

dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_1
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_1], [
AC_MSG_CHECKING([whether host toolchain supports SSE4.1])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("pmaxsb %xmm0,%xmm1");
}
]])], [
AC_DEFINE([HAVE_SSE4_1], 1, [Define if host toolchain supports SSE4.1])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])

dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_2
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_2], [
AC_MSG_CHECKING([whether host toolchain supports SSE4.2])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("pcmpgtq %xmm0, %xmm1");
}
]])], [
AC_DEFINE([HAVE_SSE4_2], 1, [Define if host toolchain supports SSE4.2])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])

dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX], [
AC_MSG_CHECKING([whether host toolchain supports AVX])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
char v[32];
__asm__ __volatile__("vmovdqa %0,%%ymm0" :: "m"(v[0]));
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX], 1, [Define if host toolchain supports AVX])
], [
AC_MSG_RESULT([no])
])
])

dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX2
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX2], [
AC_MSG_CHECKING([whether host toolchain supports AVX2])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vpshufb %ymm0,%ymm1,%ymm2");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX2], 1, [Define if host toolchain supports AVX2])
], [
AC_MSG_RESULT([no])
])
])
1 change: 1 addition & 0 deletions config/zfs-build.m4
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ AC_DEFUN([ZFS_AC_DEBUG_DMU_TX], [
AC_DEFUN([ZFS_AC_CONFIG_ALWAYS], [
ZFS_AC_CONFIG_ALWAYS_NO_UNUSED_BUT_SET_VARIABLE
ZFS_AC_CONFIG_ALWAYS_NO_BOOL_COMPARE
ZFS_AC_CONFIG_ALWAYS_TOOLCHAIN_SIMD
])

AC_DEFUN([ZFS_AC_CONFIG], [
Expand Down
3 changes: 2 additions & 1 deletion include/linux/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ KERNEL_H = \
$(top_srcdir)/include/linux/vfs_compat.h \
$(top_srcdir)/include/linux/blkdev_compat.h \
$(top_srcdir)/include/linux/utsname_compat.h \
$(top_srcdir)/include/linux/kmap_compat.h
$(top_srcdir)/include/linux/kmap_compat.h \
$(top_srcdir)/include/linux/simd_x86.h

USER_H =

Expand Down
Loading

3 comments on commit fc0c72b

@prometheanfire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that compiling the modules on one system to ship them to other systems needs to manage the flags manually? The commit message is confusing on that. It makes it seem like features are checked both at compile time and at load time (to me).

@behlendorf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prometheanfire as long you build the modules with a tool chain which supports the instructions and at run time the functionality is available these optimization will be enabled. Custom builds aren't needed.

@ryao
Copy link
Contributor

@ryao ryao commented on fc0c72b Mar 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prometheanfire There are 2 requirements for building modules that support AVX2:

  1. The kernel source tree must support AVX2.
  2. The toolchain used to build the modules must support AVX2.

AVX2 support does not depend on whether the toolchain used to build the kernel itself supports AVX2. torvalds/linux@a30469e makes that apparent given that AVX2 uses the same registers as AVX.

We could eliminate the requirement that the kernel support AVX2, but then we would need to implement our own code for AVX2 detect/save/restore, worry about whether the CPU needs the registers initialized like it did with SSE registers and worry about errata fixes. The latter two might turn out to be non-issues, but getting AVX2 working on processors that support AVX2 that use kernels that do not support AVX2 probably will not buy us enough to merit it.

If someone wants to implement that anyway, here are links to documentation on how to detect it and the instructions necessary to properly save/restore it.

https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
http://x86.renejeschke.de/html/file_module_x86_id_128.html
http://x86.renejeschke.de/html/file_module_x86_id_128.html

Please sign in to comment.