-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Implementation of SSE optimized Fletcher-4" #4861
Conversation
This reverts commit 35a76a0 to resolve the following panic which was introduced. RIP [<ffffffffa0241d11>] fletcher_4_sse2_fini+0x11/0xb0 [zcommon] Pid: 813, comm: modprobe Tainted: P D -- ------------ 2.6.32-642.3.1.el6.x86_64 #1 Call Trace: [<ffffffff81546c21>] ? panic+0xa7/0x179 [<ffffffff8154baa4>] ? oops_end+0xe4/0x100 [<ffffffff8101102b>] ? die+0x5b/0x90 [<ffffffff8154b572>] ? do_general_protection+0x152/0x160 [<ffffffff8100c4ae>] ? xen_hvm_callback_vector+0xe/0x20 [<ffffffff8154ace5>] ? general_protection+0x25/0x30 [<ffffffffa0241d00>] ? fletcher_4_sse2_fini+0x0/0xb0 [zcommon] [<ffffffffa0241d11>] ? fletcher_4_sse2_fini+0x11/0xb0 [zcommon] [<ffffffff81007bc1>] ? xen_clocksource_read+0x21/0x30 [<ffffffff81007ca9>] ? xen_clocksource_get_cycles+0x9/0x10 [<ffffffff810b1b65>] ? getrawmonotonic+0x35/0xc0 [<ffffffffa0241077>] ? fletcher_4_init+0x227/0x260 [zcommon] [<ffffffff812a8490>] ? kvasprintf+0x70/0x90 [<ffffffffa024d000>] ? zcommon_init+0x0/0xd [zcommon] [<ffffffffa024d009>] ? zcommon_init+0x9/0xd [zcommon] [<ffffffff810020d0>] ? do_one_initcall+0xc0/0x280 [<ffffffff810c8371>] ? sys_init_module+0xe1/0x250 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b Disable zimport testing against master where this flaw exists: TEST_ZIMPORT_VERSIONS="installed" Signed-off-by: Brian Behlendorf <[email protected]>
0648db0
to
7b94a8d
Compare
@tj90241 @ironMann I may need to revert this change from master because it has introduced a panic under CentOS 6. It appears to be caused by the zfs_fletcher_sse_array alignment. Increasing the alignment from 16 to 32, like in the avx2 code, resolves the issue but I'm not sure that's the best fix. What do you guys recommend? diff --git a/module/zcommon/zfs_fletcher_sse.c b/module/zcommon/zfs_fletcher_sse
index 734ae08..a6a78c5 100644
--- a/module/zcommon/zfs_fletcher_sse.c
+++ b/module/zcommon/zfs_fletcher_sse.c
@@ -48,7 +48,7 @@
#include <zfs_fletcher.h>
struct zfs_fletcher_sse_array {
- uint64_t v[2] __attribute__((aligned(16)));
+ uint64_t v[2] __attribute__((aligned(32)));
};
static void |
@behlendorf This might be a compiler issue, failing to properly aligning stack variables when incoming stack is not 16B aligned. Do you maybe have a object files around, or objdump of original code on that machine? Safest bet, and right thing to do, is replacing all memory stores with unaligned variants. So in fini(): s/movdqa/movdqu/ EDIT: Leaving alignment at 16 to conserve stack space. |
Apologies for being AWOL... What @ironMann suggested works. Looks like this was a bug in GCC (fixed in 4.6) that was fixed quasi-recently: |
@ironMann the compiler would be a very plausible explanation. I'm able to easily reproduce this with a stock CentOS 6 install which comes with the following gcc version.
Switching to the unaligned variants in fini appears to have resolved the issue as well. It looks like diff --git a/module/zcommon/zfs_fletcher_sse.c b/module/zcommon/zfs_fletcher_sse
index 734ae08..5db46fe 100644
--- a/module/zcommon/zfs_fletcher_sse.c
+++ b/module/zcommon/zfs_fletcher_sse.c
@@ -69,12 +69,12 @@ fletcher_4_sse2_fini(zio_cksum_t *zcp)
struct zfs_fletcher_sse_array a, b, c, d;
uint64_t A, B, C, D;
- asm volatile("movdqa %%xmm0, %0":"=m" (a.v));
- asm volatile("movdqa %%xmm1, %0":"=m" (b.v));
+ asm volatile("movdqu %%xmm0, %0":"=m" (a.v));
+ asm volatile("movdqu %%xmm1, %0":"=m" (b.v));
asm volatile("psllq $0x2, %xmm2");
- asm volatile("movdqa %%xmm2, %0":"=m" (c.v));
+ asm volatile("movdqu %%xmm2, %0":"=m" (c.v));
asm volatile("psllq $0x3, %xmm3");
- asm volatile("movdqa %%xmm3, %0":"=m" (d.v));
+ asm volatile("movdqu %%xmm3, %0":"=m" (d.v));
kfpu_end();
@@ -106,7 +106,7 @@ fletcher_4_sse2(const void *buf, uint64_t size, zio_cksum_t
for (; ip < ipend; ip += 2) {
asm volatile("movdqu %0, %%xmm5" :: "m"(*ip));
- asm volatile("movdqa %xmm5, %xmm6");
+ asm volatile("movdqu %xmm5, %xmm6");
asm volatile("punpckldq %xmm4, %xmm5");
asm volatile("punpckhdq %xmm4, %xmm6");
asm volatile("paddq %xmm5, %xmm0");
@@ -168,13 +168,13 @@ fletcher_4_ssse3_byteswap(const void *buf, uint64_t size,
const uint64_t *ip = buf;
const uint64_t *ipend = (uint64_t *)((uint8_t *)ip + size);
- asm volatile("movdqa %0, %%xmm7"::"m" (mask));
+ asm volatile("movdqu %0, %%xmm7"::"m" (mask));
asm volatile("pxor %xmm4, %xmm4");
for (; ip < ipend; ip += 2) {
asm volatile("movdqu %0, %%xmm5"::"m" (*ip));
asm volatile("pshufb %xmm7, %xmm5");
- asm volatile("movdqa %xmm5, %xmm6");
+ asm volatile("movdqu %xmm5, %xmm6");
asm volatile("punpckldq %xmm4, %xmm5");
asm volatile("punpckhdq %xmm4, %xmm6");
asm volatile("paddq %xmm5, %xmm0"); @tj90241 perhaps you could propose a PR to address this and we can avoid reverting the patch and instead just fix it. |
Sure. Though: do we know if it's necessary to use movdqus in the core loop? While using unaligned loads for the fini isn't a problem (since it's only done once), unaligned memory accesses in the core loop could result in a performance hit. |
@behlendorf Only memory referencing movs have to be "movdqu". Other ones just move content from one reg to another (they are probably optimized out by the CPU (reg renaming) and not even executed). One that should be changed is the following:
(it's probably fine, since static mask is not on stack, but just to be on safe side) |
This reverts commit 35a76a0 which introduced a regression on CentOS 6.7 for automated testing. This issue appears not to impacted newer kernels even through the testing is performed on instances of the same type.