-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Adds missing extract implementations for AVX512. #2926
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/seqan/seqan3/35cna2ZNEWZn9HFawYVQjC8ckpJn |
Codecov Report
@@ Coverage Diff @@
## master #2926 +/- ##
=======================================
Coverage 98.28% 98.28%
=======================================
Files 267 267
Lines 11462 11462
=======================================
Hits 11265 11265
Misses 197 197
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about tests for the new functions?
#if defined(__AVX512DQ__) | ||
template <uint8_t index, simd::simd_concept simd_t> | ||
constexpr simd_t extract_quarter_avx512(simd_t const & src); | ||
constexpr simd_t extract_quarter_avx512(simd_t const & src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are extract_quarter/eighth_avx512
wrapped in #if defined(__AVX512DQ__)
but extract_half_avx512
isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests existed already (for SSE and AVX2). I ran the tests on icebear with AVX512 support to check the results.
The additional check is because the extract quarter requires an intrinsic that is only available by the AVX-512 CD subset. Not all AVX512 platforms might have this additional intrinsics subset, but still you can use other AVX512 intrinsics. The extract half doesn't need it because it only works with AVX512-F which is the foundational intrinsics set used by all that support AVX512.
This PR adds the missing implementations for the family of extract functions specific to the AVX512 instruction set, as they are much faster then the default implementation.
A bit on the background:
These functions allow to extract only a part (half, i.e. 2x256 bit, quarter, i.e. 4x128 bit, or eighth, i.e. 8x64 bit) of the given simd vector and are used in combination with the upcast function in order to sign/zero extend a simd vector to a set of simd vectors with larger operand types, e.g. one vint_8x64_t vector to 2 vint_16x32_t vectors, with 8 (16) being the bit-size of the operands and 64 (32) being the number of operands that fit into a single __m512i vector type.
The extracted bytes will be placed in the lower bits of the target simd vector. There is no dedicated intrinsic to extract only 64 bits, so I had to emulate it with the intrinsic to extract 128 bit and using an additional shuffle operation for the uneven indices. This shuffle operation exchanges the higher 64 bits with the lower 64 bits of each of the four 128 bit lanes of the source simd vector. When upcasting this vector, only the first 64 bits are considered (Note this is only needed when going from 8-bit operands to 64-bit operands)