Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Implementation of SSE optimized Fletcher-4" #4861

Closed
wants to merge 1 commit into from

Conversation

behlendorf
Copy link
Contributor

This reverts commit 35a76a0 which introduced a regression on CentOS 6.7 for automated testing. This issue appears not to impacted newer kernels even through the testing is performed on instances of the same type.

RIP  [<ffffffffa0241d11>] fletcher_4_sse2_fini+0x11/0xb0 [zcommon]
Pid: 813, comm: modprobe Tainted: P      D    -- ------------
2.6.32-642.3.1.el6.x86_64 #1
Call Trace:
 [<ffffffff81546c21>] ? panic+0xa7/0x179
 [<ffffffff8154baa4>] ? oops_end+0xe4/0x100
 [<ffffffff8101102b>] ? die+0x5b/0x90
 [<ffffffff8154b572>] ? do_general_protection+0x152/0x160
 [<ffffffff8100c4ae>] ? xen_hvm_callback_vector+0xe/0x20
 [<ffffffff8154ace5>] ? general_protection+0x25/0x30
 [<ffffffffa0241d00>] ? fletcher_4_sse2_fini+0x0/0xb0 [zcommon]
 [<ffffffffa0241d11>] ? fletcher_4_sse2_fini+0x11/0xb0 [zcommon]
 [<ffffffff81007bc1>] ? xen_clocksource_read+0x21/0x30
 [<ffffffff81007ca9>] ? xen_clocksource_get_cycles+0x9/0x10
 [<ffffffff810b1b65>] ? getrawmonotonic+0x35/0xc0
 [<ffffffffa0241077>] ? fletcher_4_init+0x227/0x260 [zcommon]
 [<ffffffff812a8490>] ? kvasprintf+0x70/0x90
 [<ffffffffa024d000>] ? zcommon_init+0x0/0xd [zcommon]
 [<ffffffffa024d009>] ? zcommon_init+0x9/0xd [zcommon]
 [<ffffffff810020d0>] ? do_one_initcall+0xc0/0x280
 [<ffffffff810c8371>] ? sys_init_module+0xe1/0x250
 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b

This reverts commit 35a76a0 to
resolve the following panic which was introduced.

RIP  [<ffffffffa0241d11>] fletcher_4_sse2_fini+0x11/0xb0 [zcommon]
Pid: 813, comm: modprobe Tainted: P      D    -- ------------
2.6.32-642.3.1.el6.x86_64 #1
Call Trace:
 [<ffffffff81546c21>] ? panic+0xa7/0x179
 [<ffffffff8154baa4>] ? oops_end+0xe4/0x100
 [<ffffffff8101102b>] ? die+0x5b/0x90
 [<ffffffff8154b572>] ? do_general_protection+0x152/0x160
 [<ffffffff8100c4ae>] ? xen_hvm_callback_vector+0xe/0x20
 [<ffffffff8154ace5>] ? general_protection+0x25/0x30
 [<ffffffffa0241d00>] ? fletcher_4_sse2_fini+0x0/0xb0 [zcommon]
 [<ffffffffa0241d11>] ? fletcher_4_sse2_fini+0x11/0xb0 [zcommon]
 [<ffffffff81007bc1>] ? xen_clocksource_read+0x21/0x30
 [<ffffffff81007ca9>] ? xen_clocksource_get_cycles+0x9/0x10
 [<ffffffff810b1b65>] ? getrawmonotonic+0x35/0xc0
 [<ffffffffa0241077>] ? fletcher_4_init+0x227/0x260 [zcommon]
 [<ffffffff812a8490>] ? kvasprintf+0x70/0x90
 [<ffffffffa024d000>] ? zcommon_init+0x0/0xd [zcommon]
 [<ffffffffa024d009>] ? zcommon_init+0x9/0xd [zcommon]
 [<ffffffff810020d0>] ? do_one_initcall+0xc0/0x280
 [<ffffffff810c8371>] ? sys_init_module+0xe1/0x250
 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b

Disable zimport testing against master where this flaw exists:

TEST_ZIMPORT_VERSIONS="installed"

Signed-off-by: Brian Behlendorf <[email protected]>
@behlendorf behlendorf force-pushed the fletcher-sse-revert branch from 0648db0 to 7b94a8d Compare July 18, 2016 19:53
@behlendorf
Copy link
Contributor Author

behlendorf commented Jul 18, 2016

@tj90241 @ironMann I may need to revert this change from master because it has introduced a panic under CentOS 6. It appears to be caused by the zfs_fletcher_sse_array alignment. Increasing the alignment from 16 to 32, like in the avx2 code, resolves the issue but I'm not sure that's the best fix. What do you guys recommend?

diff --git a/module/zcommon/zfs_fletcher_sse.c b/module/zcommon/zfs_fletcher_sse
index 734ae08..a6a78c5 100644
--- a/module/zcommon/zfs_fletcher_sse.c
+++ b/module/zcommon/zfs_fletcher_sse.c
@@ -48,7 +48,7 @@
 #include <zfs_fletcher.h>

 struct zfs_fletcher_sse_array {
-       uint64_t v[2] __attribute__((aligned(16)));
+       uint64_t v[2] __attribute__((aligned(32)));
 };

 static void

@ironMann
Copy link
Contributor

ironMann commented Jul 18, 2016

@behlendorf This might be a compiler issue, failing to properly aligning stack variables when incoming stack is not 16B aligned. Do you maybe have a object files around, or objdump of original code on that machine?

Safest bet, and right thing to do, is replacing all memory stores with unaligned variants. So in fini(): s/movdqa/movdqu/
Then alignment will not matter.

EDIT: Leaving alignment at 16 to conserve stack space.

@tj90241
Copy link
Contributor

tj90241 commented Jul 18, 2016

Apologies for being AWOL...

What @ironMann suggested works.

Looks like this was a bug in GCC (fixed in 4.6) that was fixed quasi-recently:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838

@behlendorf
Copy link
Contributor Author

@ironMann the compiler would be a very plausible explanation. I'm able to easily reproduce this with a stock CentOS 6 install which comes with the following gcc version.

gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)

Switching to the unaligned variants in fini appears to have resolved the issue as well. It looks like fletcher_4_ssse3_byteswap and fletcher_4_sse2 could suffer from the same issue. Should they also be changed to the unaligned variant? Something like this?

diff --git a/module/zcommon/zfs_fletcher_sse.c b/module/zcommon/zfs_fletcher_sse
index 734ae08..5db46fe 100644
--- a/module/zcommon/zfs_fletcher_sse.c
+++ b/module/zcommon/zfs_fletcher_sse.c
@@ -69,12 +69,12 @@ fletcher_4_sse2_fini(zio_cksum_t *zcp)
        struct zfs_fletcher_sse_array a, b, c, d;
        uint64_t A, B, C, D;

-       asm volatile("movdqa %%xmm0, %0":"=m" (a.v));
-       asm volatile("movdqa %%xmm1, %0":"=m" (b.v));
+       asm volatile("movdqu %%xmm0, %0":"=m" (a.v));
+       asm volatile("movdqu %%xmm1, %0":"=m" (b.v));
        asm volatile("psllq $0x2, %xmm2");
-       asm volatile("movdqa %%xmm2, %0":"=m" (c.v));
+       asm volatile("movdqu %%xmm2, %0":"=m" (c.v));
        asm volatile("psllq $0x3, %xmm3");
-       asm volatile("movdqa %%xmm3, %0":"=m" (d.v));
+       asm volatile("movdqu %%xmm3, %0":"=m" (d.v));

        kfpu_end();

@@ -106,7 +106,7 @@ fletcher_4_sse2(const void *buf, uint64_t size, zio_cksum_t 

        for (; ip < ipend; ip += 2) {
                asm volatile("movdqu %0, %%xmm5" :: "m"(*ip));
-               asm volatile("movdqa %xmm5, %xmm6");
+               asm volatile("movdqu %xmm5, %xmm6");
                asm volatile("punpckldq %xmm4, %xmm5");
                asm volatile("punpckhdq %xmm4, %xmm6");
                asm volatile("paddq %xmm5, %xmm0");
@@ -168,13 +168,13 @@ fletcher_4_ssse3_byteswap(const void *buf, uint64_t size, 
        const uint64_t *ip = buf;
        const uint64_t *ipend = (uint64_t *)((uint8_t *)ip + size);

-       asm volatile("movdqa %0, %%xmm7"::"m" (mask));
+       asm volatile("movdqu %0, %%xmm7"::"m" (mask));
        asm volatile("pxor %xmm4, %xmm4");

        for (; ip < ipend; ip += 2) {
                asm volatile("movdqu %0, %%xmm5"::"m" (*ip));
                asm volatile("pshufb %xmm7, %xmm5");
-               asm volatile("movdqa %xmm5, %xmm6");
+               asm volatile("movdqu %xmm5, %xmm6");
                asm volatile("punpckldq %xmm4, %xmm5");
                asm volatile("punpckhdq %xmm4, %xmm6");
                asm volatile("paddq %xmm5, %xmm0");

@tj90241 perhaps you could propose a PR to address this and we can avoid reverting the patch and instead just fix it.

@tj90241
Copy link
Contributor

tj90241 commented Jul 18, 2016

Sure. Though: do we know if it's necessary to use movdqus in the core loop? While using unaligned loads for the fini isn't a problem (since it's only done once), unaligned memory accesses in the core loop could result in a performance hit.

@ironMann
Copy link
Contributor

@behlendorf Only memory referencing movs have to be "movdqu". Other ones just move content from one reg to another (they are probably optimized out by the CPU (reg renaming) and not even executed). One that should be changed is the following:

@@ -168,13 +168,13 @@ fletcher_4_ssse3_byteswap(const void *buf, uint64_t size, 
        const uint64_t *ip = buf;
        const uint64_t *ipend = (uint64_t *)((uint8_t *)ip + size);

-       asm volatile("movdqa %0, %%xmm7"::"m" (mask));
+       asm volatile("movdqu %0, %%xmm7"::"m" (mask));

(it's probably fine, since static mask is not on stack, but just to be on safe side)

@behlendorf
Copy link
Contributor Author

@tj90241 what @ironMann said. The CentOS 6 builder should uncover any problems if they're there, but I wasn't able to reproduce any other issues after make just the change to fini.

@behlendorf
Copy link
Contributor Author

Closing in favor of #4862. @ironMann @tj90241 thanks for the quick patch and review!

@behlendorf behlendorf closed this Jul 18, 2016
@behlendorf behlendorf deleted the fletcher-sse-revert branch April 19, 2021 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants