-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AVX512 support for CRC32c implementation (on Intel platforms) #68
Conversation
Hi @pbadari thanks for this! I'm having a look and will provide comments and review as soon as possible. |
@@ -66,9 +66,11 @@ if (USE_CPU_EXTENSIONS) | |||
source_group("Source Files\\intel\\visualc" FILES ${AWS_ARCH_SRC}) | |||
|
|||
elseif(AWS_ARCH_INTEL AND AWS_HAVE_GCC_INLINE_ASM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the inline check correct now? this pulled the inline asm and we really want to check that the intrinsics are available. IIRC there's instances where the assembler installed in the path is newer than the gcc version, would that drift slip through here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Work as expected. We tested on CentOS 8 using gcc 13.1 as 2.36 and Ubuntu 23.04 using gcc 12.2 as 2.4
-- Performing Test AWS_ARCH_INTEL - Success
-- Performing Test AWS_ARCH_ARM64
-- Performing Test AWS_ARCH_ARM64 - Failed
-- Performing Test AWS_ARCH_ARM32
-- Performing Test AWS_ARCH_ARM32 - Failed
-- Performing Test AWS_HAVE_GCC_INLINE_ASM
-- Performing Test AWS_HAVE_GCC_INLINE_ASM - Success
Regarding you question the answer yes, but we can add some checks for intrinsics.
check_c_source_runs(“
#include <nmmintrin.h>
int main() {
__m128i a = _mm_setzero_si128();
return 0;
}
“ SSE42_SUPPORTED)
check_c_source_runs(“
#include <immintrin.h>
int main() {
__m512 a = _mm512_setzero_ps();
return 0;
}
“ AVX512_SUPPORTED)
If this doesn't answer your question, please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about this one, would it help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not at all. We will need support for SSE42 and AVX512 intrinsics since AVX2 only supports 256-bit wide vector registers. Should we add sse42 and avx512 intrinsic support to that cmake file or directly to aws checksum CMakeLists.txt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add it to aws-c-common so it can be used more broadly, and so we don't wind up with these scatter shot everywhere. Unfortunately that means we need to coordinate merges now. I'll go ahead and file the PR for aws-c-common and post it here. In the meantime we have some build-chain wiring to sort out anyways and we can get going on that.
For future readers of this, I suspect this might cause some Rust build issues as I doubt AVX512 is turned on by default in cargo and those build flags are usually global to the build process. So if your linker is complaining about it, try making sure AVX512 is turned on in cargo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re rust build questions: AVX512 should only be a codegen-time thing, I wouldn't expect it to depend on anything configured on the linker.
@@ -6,6 +6,12 @@ | |||
#include <aws/checksums/private/crc_priv.h> | |||
|
|||
#include <aws/common/cpuid.h> | |||
#include <emmintrin.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mv source/intel/asm source/intel/intrin
then we need to see if the visualc code will actually work because i think that's just using the other intrinsics. Also, I think we can just delete the visualc code if we're not using any asm. Can we delete visualc and have cmake use this file for both and see what happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Results moving the asm folder to intrin
mv source/intel/asm source/intel/intrin
[root@aws-spr4 build]# make
[ 16%] Building C object CMakeFiles/aws-checksums.dir/source/crc.c.o
[ 33%] Building C object CMakeFiles/aws-checksums.dir/source/crc_sw.c.o
[ 50%] Linking C static library libaws-checksums.a
[ 50%] Built target aws-checksums
[ 66%] Building C object tests/CMakeFiles/aws-checksums-tests.dir/test_runner.c.o
[ 83%] Building C object tests/CMakeFiles/aws-checksums-tests.dir/crc_test.c.o
[100%] Linking C executable aws-checksums-tests
/usr/bin/ld: ../libaws-checksums.a(crc.c.o): in function aws_checksums_crc32': crc.c:(.text+0x41): undefined reference to
aws_checksums_crc32_hw'
/usr/bin/ld: ../libaws-checksums.a(crc.c.o): in function aws_checksums_crc32c': crc.c:(.text+0xa1): undefined reference to
aws_checksums_crc32c_hw'
collect2: error: ld returned 1 exit status
make[2]: *** [tests/CMakeFiles/aws-checksums-tests.dir/build.make:115: tests/aws-checksums-tests] Error 1
make[1]: *** [CMakeFiles/Makefile2:882: tests/CMakeFiles/aws-checksums-tests.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
Removing source/intel/visualc folder
[root@aws-spr4 build]# make
[ 14%] Building C object CMakeFiles/aws-checksums.dir/source/crc.c.o
[ 28%] Building C object CMakeFiles/aws-checksums.dir/source/crc_sw.c.o
[ 42%] Building C object CMakeFiles/aws-checksums.dir/source/intel/asm/crc32c_sse42_asm.c.o
[ 57%] Linking C static library libaws-checksums.a
[ 57%] Built target aws-checksums
[ 71%] Building C object tests/CMakeFiles/aws-checksums-tests.dir/test_runner.c.o
[ 85%] Building C object tests/CMakeFiles/aws-checksums-tests.dir/crc_test.c.o
[100%] Linking C executable aws-checksums-tests
[100%] Built target aws-checksums-tests
[root@aws-spr4 build]# make install
Consolidate compiler generated dependencies of target aws-checksums
[ 57%] Built target aws-checksums
Consolidate compiler generated dependencies of target aws-checksums-tests
[100%] Built target aws-checksums-tests
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/lib64/libaws-checksums.a
-- Up-to-date: /usr/local/include/aws/checksums/crc.h
-- Up-to-date: /usr/local/include/aws/checksums/exports.h
-- Installing: /usr/local/lib64/aws-checksums/cmake/static/aws-checksums-targets.cmake
-- Installing: /usr/local/lib64/aws-checksums/cmake/static/aws-checksums-targets-release.cmake
-- Installing: /usr/local/lib64/aws-checksums/cmake/aws-checksums-config.cmake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suspicion on the build failure by moving asm to intrin and updating cmake to find it and it still not building is the cmake cache. I usually just wipe out the build and re-run cmake to make sure it's not that.
The visualc stuff would have to be run on an msvc compiler to check. I have my windows machine up and i'll give it a look and report back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding move asm to intrin. Clean it again and rebuild it with no issues.
@AGSaidi any chance you could have a look? |
@JonathanHenson I don't know i'm going to be a lot of help with the x86 intrinsics. |
Add sse42 avx512_intrinsics support
added the aws-c-common functionality here: |
I'm working on this one now. I need to do some build surgery to handle that the SSE4.2 implementation on windows can't use asm, but the intrinsics version of AVX512 can be used. I'm going to split the files up and do some extra indirection. |
I just noticed we don't have any test cases with inputs large enough to trigger the avx512 path, i'll add some tests for it. |
Closing in favor of new PR: #72 |
Issue #, if available:
Description of changes:
Add AVX512 optimization for CRC32c implementation when available.
AVX512 optimized CRC32c implementation provides 15-70% performance boost over the SSE42 implementation.
Performance Data:
Following are the performance run results current and new crc32c implementation of various buffer sizes.
(Each implementation is called 200 times on each buffer)
Buffer size(K): 0.50 sse42-time: 5 (13786) avx512-time: 4 (8380) (39.21%)
Buffer size(K): 1.00 sse42-time: 8 (26364) avx512-time: 5 (11790) (55.28%)
Buffer size(K): 4.00 sse42-time: 29 (103973) avx512-time: 10 (33036) (68.23%)
Buffer size(K): 16.00 sse42-time: 111 (415911) avx512-time: 34 (124540) (70.06%)
Buffer size(K): 64.00 sse42-time: 448 (1674641) avx512-time: 143 (505714) (69.80%)
Buffer size(K): 128.00 sse42-time: 891 (3364668) avx512-time: 250 (913173) (72.86%)
Buffer size(K): 256.00 sse42-time: 1781 (6743055) avx512-time: 498 (1826869) (72.91%)
Buffer size(K): 512.00 sse42-time: 3566 (13507649) avx512-time: 992 (3656864) (72.93%)
Buffer size(K): 1024.00 sse42-time: 7134 (27031239) avx512-time: 2031 (7319223) (72.92%)
Buffer size(K): 4096.00 sse42-time: 38736 (146790024) avx512-time: 32525 (123236655) (16.05%)
Buffer size(K): 8192.00 sse42-time: 108094 (409707156) avx512-time: 73586 (278907186) (31.93%)
Buffer size(K): 16384.00 sse42-time: 146284 (554442499) avx512-time: 97824 (370768645) (33.13%)
Buffer size(K): 32768.00 sse42-time: 251401 (952912617) avx512-time: 183591 (695846812) (26.98%)
Buffer size(K): 65536.00 sse42-time: 522123 (1979099475) avx512-time: 400916 (1519591618) (23.22%)
Buffer size(K): 131072.00 sse42-time: 1586697 (6013914985) avx512-time: 1193713 (4524356415) (24.77%)
Buffer size(K): 262144.00 sse42-time: 3971185 (15052170297) avx512-time: 3205870 (12151303651) (19.27%)
Buffer size(K): 524288.00 sse42-time: 8858393 (33576763915) avx512-time: 7368324 (27929180421) (16.82%)
Buffer size(K): 1048576.00 sse42-time: 18236703 (69127153546) avx512-time: 15343947 (58161536028) (15.86%)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.