Fletcher4: Incremental updates and ctx calculation #5164

ironMann · 2016-09-25T07:35:35Z

Patch fixes ABI issues with fletcher4 code, adds support for incremental updates, and adds ztest method for testing.
Partly discussed in #5093

behlendorf · 2016-09-30T04:42:46Z

@tuxoko could you review and comment on this alternate approach to the one you've proposed in #5093.

tuxoko · 2016-09-30T21:02:31Z

module/zcommon/zfs_fletcher.c

@@ -404,7 +386,7 @@ fletcher_4_native_impl(const fletcher_4_ops_t *ops, const void *buf,
 		ops->fini_native(zcp);
 }

-void
+inline void
 fletcher_4_native(const void *buf, uint64_t size, zio_cksum_t *zcp)
 {


Please change line 407 to scalar, this would cause unnecessary recursion. The same goes the byteswap version.

Updated. There's no ctx handling overhead whenever we explicitly want the scalar version.

tuxoko · 2016-09-30T21:06:18Z

module/zcommon/zfs_fletcher.c

+fletcher_4_incremental_impl(boolean_t native, const void *buf, uint64_t size,
+    zio_cksum_t *zcp)
+{
+	static const uint64_t FLETCHER_4_INC_MAX = 8ULL << 20;


Please add comment about the reason

It's misleading for FLETCHER_4_INC_MAX to be a variable. Why not make this a #define with a clear comment... which I see you've done.

behlendorf · 2016-09-30T23:22:24Z

cmd/ztest/ztest.c

+		VERIFY0(fletcher_4_impl_set("cycle"));
+
+		while (run_count-- > 0) {
+


Extra whitespace

behlendorf · 2016-09-30T23:28:41Z

module/zcommon/zfs_fletcher.c

+fletcher_4_incremental_impl(boolean_t native, const void *buf, uint64_t size,
+    zio_cksum_t *zcp)
+{
+	static const uint64_t FLETCHER_4_INC_MAX = 8ULL << 20;


It's misleading for FLETCHER_4_INC_MAX to be a variable. Why not make this a #define with a clear comment... which I see you've done.

behlendorf

I like how you were able to abstract away the context so none of the consumers need to be aware of this implementation detail.

@tuxoko any objection to adopting this patch stack in favor of #5093?

behlendorf · 2016-09-30T23:54:15Z

include/zfs_fletcher.h

+#if defined(__x86_64) && defined(HAVE_AVX512F)
+	zfs_fletcher_avx512_t avx512[4];
+#endif
+} fletcher_4_ctx_t;


When all implementations are available this context ends up being fairly large, 480 bytes by my quick math. This isn't huge but I noticed it is stored on the stack which might lead to problems. We may want to make this a union to avoid wasting the space.

@behlendorf ctx is already an union, so on x86 it's going to be 256B (avx512)

tuxoko · 2016-10-01T04:02:16Z

@behlendorf
Please go ahead.

All users of fletcher4 methods must call `fletcher_4_init()/_fini()` There's no benchmarking overhead when called from user-space. Signed-off-by: Gvozden Neskovic <[email protected]>

behlendorf · 2016-10-04T21:48:43Z

@ironMann are you happy with this change and consider it ready to merge? If so can you rebase this on master to resolve the new zfs_fletcher.c conflict.

@tuxoko if you're happy with these changes can you approve it.

behlendorf

Aside from the stack usage comment, which I don't think should be changed until/if it becomes an issue this LGTM.

Combine incrementally computed fletcher4 checksums. Checksums are combined a posteriori, allowing for parallel computation on chunks to be implemented if required. The algorithm is general, and does not add changes in each SIMD implementation. New test in ztest verifies incremental fletcher computations. Checksum combining matrix for two buffers `a` and `b`, where `Ca` and `Cb` are respective fletcher4 checksums, `Cab` is combined checksum, `s` is size of buffer `b` (divided by sizeof(uint32_t)) is: Cab[A] = Cb[A] + Ca[A] Cab[B] = Cb[B] + Ca[B] + s * Ca[A] Cab[C] = Cb[C] + Ca[C] + s * Ca[B] + s(s+1)/2 * Ca[A] Cab[D] = Cb[D] + Ca[D] + s * Ca[C] + s(s+1)/2 * Ca[B] + s(s+1)(s+2)/6 * Ca[A] NOTE: this calculation overflows for larger buffers. Thus, internally, the calculation is performed on 8MiB chunks. Signed-off-by: Gvozden Neskovic <[email protected]>

Init, compute, and fini methods are changed to work on internal context object. This is necessary because ABI does not guarantee that SIMD registers will be preserved on function calls. This is technically the case in Linux kernel in between `kfpu_begin()/kfpu_end()`, but it breaks user-space tests and some kernels that don't require disabling preemption for using SIMD (osx). Use scalar compute methods in-place for small buffers, and when the buffer size does not meet SIMD size alignment. Signed-off-by: Gvozden Neskovic <[email protected]>

ironMann · 2016-10-05T20:47:53Z

@behlendorf Rebased. I think this is the best I can do for now. 5cc78dc changes cksum interface to add contex initialization, but that is not helpful for fletcher (we need zeroed ctx). Also, that patch has various cksum contexts on stack, like Skein_256_Ctxt_t, SHA1_CTX, SHA2_CTX... If that becomes an issue, it should be solved across the board.

ironMann force-pushed the simd_incr_fletcher branch 3 times, most recently from f41a9d3 to 9e5b0a6 Compare September 28, 2016 21:53

tuxoko suggested changes Sep 30, 2016

View reviewed changes

ironMann force-pushed the simd_incr_fletcher branch from 9e5b0a6 to c513ab8 Compare September 30, 2016 23:05

behlendorf reviewed Sep 30, 2016

View reviewed changes

ironMann force-pushed the simd_incr_fletcher branch from c513ab8 to dbb1ef8 Compare September 30, 2016 23:49

behlendorf reviewed Oct 1, 2016

View reviewed changes

behlendorf mentioned this pull request Oct 1, 2016

Use simd for incremental fletcher-4 #5093

Closed

ironMann mentioned this pull request Oct 1, 2016

This is the parity generation/rebuild using 128-bits NEON for Aarch64. #4801

Merged

Fletcher4: Init in libzfs_init()

dc03fa3

All users of fletcher4 methods must call `fletcher_4_init()/_fini()` There's no benchmarking overhead when called from user-space. Signed-off-by: Gvozden Neskovic <[email protected]>

behlendorf approved these changes Oct 4, 2016

View reviewed changes

ironMann added 2 commits October 5, 2016 16:41

ironMann force-pushed the simd_incr_fletcher branch from dbb1ef8 to 5bf703b Compare October 5, 2016 14:53

behlendorf merged commit 482cd9e into openzfs:master Oct 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fletcher4: Incremental updates and ctx calculation #5164

Fletcher4: Incremental updates and ctx calculation #5164

ironMann commented Sep 25, 2016

behlendorf commented Sep 30, 2016

tuxoko Sep 30, 2016 •

edited

Loading

ironMann Sep 30, 2016

tuxoko Sep 30, 2016

ironMann Sep 30, 2016

behlendorf Sep 30, 2016 •

edited

Loading

behlendorf Sep 30, 2016

ironMann Oct 1, 2016

behlendorf Sep 30, 2016 •

edited

Loading

behlendorf left a comment

behlendorf Sep 30, 2016 •

edited

Loading

ironMann Oct 1, 2016

tuxoko commented Oct 1, 2016

behlendorf commented Oct 4, 2016

behlendorf left a comment

ironMann commented Oct 5, 2016

		VERIFY0(fletcher_4_impl_set("cycle"));

		while (run_count-- > 0) {

Fletcher4: Incremental updates and ctx calculation #5164

Fletcher4: Incremental updates and ctx calculation #5164

Conversation

ironMann commented Sep 25, 2016

behlendorf commented Sep 30, 2016

tuxoko Sep 30, 2016 • edited Loading

Choose a reason for hiding this comment

ironMann Sep 30, 2016

Choose a reason for hiding this comment

tuxoko Sep 30, 2016

Choose a reason for hiding this comment

ironMann Sep 30, 2016

Choose a reason for hiding this comment

behlendorf Sep 30, 2016 • edited Loading

Choose a reason for hiding this comment

behlendorf Sep 30, 2016

Choose a reason for hiding this comment

ironMann Oct 1, 2016

Choose a reason for hiding this comment

behlendorf Sep 30, 2016 • edited Loading

Choose a reason for hiding this comment

behlendorf left a comment

Choose a reason for hiding this comment

behlendorf Sep 30, 2016 • edited Loading

Choose a reason for hiding this comment

ironMann Oct 1, 2016

Choose a reason for hiding this comment

tuxoko commented Oct 1, 2016

behlendorf commented Oct 4, 2016

behlendorf left a comment

Choose a reason for hiding this comment

ironMann commented Oct 5, 2016

tuxoko Sep 30, 2016 •

edited

Loading

behlendorf Sep 30, 2016 •

edited

Loading

behlendorf Sep 30, 2016 •

edited

Loading

behlendorf Sep 30, 2016 •

edited

Loading