Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RAID-Z routines for SSE2 instruction set, in x86_64 mode. #4815

Closed
wants to merge 1 commit into from

Conversation

ironMann
Copy link
Contributor

@ironMann ironMann commented Jun 28, 2016

Add RAID-Z routines for SSE2 instruction set, in x86_64 mode.

The patch covers low-end and older x86 CPUs.
Parity generation is equivalent to SSSE3 implementation, but reconstruction
is somewhat slower.
Previous 'sse' implementation is renamed to 'ssse3' to indicate highest instr.
set used.

Benchmark results:

scalar_rec_p                    4    720476442
scalar_rec_q                    4    187462804
scalar_rec_r                    4    138996096
scalar_rec_pq                   4    140834951
scalar_rec_pr                   4    129332035
scalar_rec_qr                   4    81619194
scalar_rec_pqr                  4    53376668

sse2_rec_p                      4    2427757064
sse2_rec_q                      4    747120861
sse2_rec_r                      4    499871637
sse2_rec_pq                     4    522403710
sse2_rec_pr                     4    464632780
sse2_rec_qr                     4    319124434
sse2_rec_pqr                    4    205794190

ssse3_rec_p                     4    2519939444
ssse3_rec_q                     4    1003019289
ssse3_rec_r                     4    616428767
ssse3_rec_pq                    4    706326396
ssse3_rec_pr                    4    570493618
ssse3_rec_qr                    4    400185250
ssse3_rec_pqr                   4    377541245

original_rec_p                  4    691658568
original_rec_q                  4    195510948
original_rec_r                  4    26075538
original_rec_pq                 4    103087368
original_rec_pr                 4    15767058
original_rec_qr                 4    15513175
original_rec_pqr                4    10746357

Issue #4783

@behlendorf
Copy link
Contributor

Nice, this LGTM and tested the local overnight testing I threw at it.

@behlendorf behlendorf added this to the 0.7.0 milestone Jun 29, 2016
@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Jun 29, 2016
@ironMann
Copy link
Contributor Author

I pushed some code cleanup. I'm happy with the state of the patch, provided tests come back negative.

@angstymeat
Copy link

I'm running the version before your cleanup an hour ago, but all of the older systems are showing "sse2" as available and say they're using the fastest algorithm.

I'm running some disk scrubs right now, but everything looks good.

@ironMann
Copy link
Contributor Author

but all of the older systems are showing "sse2" as available and say they're using the fastest algorithm.

@angstymeat This is by design. We run a quick benchmark on module load to find the fastest supported methods. You can switch explicitly to 'sse2' but there's no need. If sse2 is listed, it's used.

@angstymeat
Copy link

I understand that, I was just trying to get across that it looks like sse2 is available and that they are using it.

@behlendorf
Copy link
Contributor

behlendorf commented Jul 12, 2016

LGTM. @ironMann why don't you rebase this against master and force update the PR for one last test run. Feel free to add another small patch to address the other issues noted in your 4328 comment

The patch covers low-end and older x86 CPUs.
Parity generation is equivalent to SSSE3 implementation, but reconstruction
is somewhat slower.
Previous 'sse' implementation is renamed to 'ssse3' to indicate highest instr.
set used.

Benchmark results:
scalar_rec_p                    4    720476442
scalar_rec_q                    4    187462804
scalar_rec_r                    4    138996096
scalar_rec_pq                   4    140834951
scalar_rec_pr                   4    129332035
scalar_rec_qr                   4    81619194
scalar_rec_pqr                  4    53376668

sse2_rec_p                      4    2427757064
sse2_rec_q                      4    747120861
sse2_rec_r                      4    499871637
sse2_rec_pq                     4    522403710
sse2_rec_pr                     4    464632780
sse2_rec_qr                     4    319124434
sse2_rec_pqr                    4    205794190

ssse3_rec_p                     4    2519939444
ssse3_rec_q                     4    1003019289
ssse3_rec_r                     4    616428767
ssse3_rec_pq                    4    706326396
ssse3_rec_pr                    4    570493618
ssse3_rec_qr                    4    400185250
ssse3_rec_pqr                   4    377541245

original_rec_p                  4    691658568
original_rec_q                  4    195510948
original_rec_r                  4    26075538
original_rec_pq                 4    103087368
original_rec_pr                 4    15767058
original_rec_qr                 4    15513175
original_rec_pqr                4    10746357

Issue openzfs#4783

Signed-off-by: Gvozden Neskovic <[email protected]>
@ironMann
Copy link
Contributor Author

@behlendorf Rebased, some issues with testers though...

@behlendorf
Copy link
Contributor

The testing failures were unrelated. Merged as:

ae25d22 Add RAID-Z routines for SSE2 instruction set, in x86_64 mode.

@behlendorf behlendorf closed this Jul 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants