Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for core.simd, intel-intrinsics, or inlined assembly #1

Open
dd86k opened this issue Nov 1, 2021 · 5 comments
Open

Add support for core.simd, intel-intrinsics, or inlined assembly #1

dd86k opened this issue Nov 1, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@dd86k
Copy link
Owner

dd86k commented Nov 1, 2021

Having some form of acceleration would benefit everyone, which this module currently lacks.

Options:

  1. core.simd -- Supported everywhere, I think.
  2. intel-intrinsics DUB package -- Somewhat supports all compilers.
  3. Inlined assembly -- If all fails, at least x86 users would benefit. But limited to AVX/AV2 and not SSE* at best (because DMD).

The plan is to try options 1 and 2 and see which yields the best results (through benchmark and godbolt).

NOTE: The reason this wasn't implemented at first is because this module was once a contestant to get into Phobos.

@dd86k dd86k added the enhancement New feature or request label Nov 1, 2021
@dd86k dd86k self-assigned this Nov 1, 2021
@dd86k
Copy link
Owner Author

dd86k commented Dec 13, 2021

Or I could do like the std.digest.sha package does and at least support SSSE3 (version USE_SSSE3 is used from D_InlineAsm_X86 and D_InlineAsm_X86_64). At least I already have experience using the inline assembler under DMD, GDC, and LDC.

@dd86k
Copy link
Owner Author

dd86k commented Dec 26, 2021

I don't really think this needs SIMD this much because when compiled with LDC or GDC, I get similar performance results compared to OpenSSL.

Test env:

  • sha3-d 1.2.1 with a buffer of 64 KiB (in ddh 1.3).
    • All compiled with release-nobounds.
  • openssl 1.1.1f with default settings.
  • Processor: Intel Core i7-3700
  • Machine: VirtualBox 6.1.30 with KVM and guest additions
  • OS: Ubuntu 20.04 LTS (5.11.0-43-generic)

Results (input: pv -r /dev/urandom piped):

  • openssl dgst -sha3-256: 111-127 MiB/s
  • ddh sha3-256 with dmd 2.098.1: 32-34 MiB/s (worse than 2.090.1 which is more around the 42 MiB/s mark)
  • ddh sha3-256 with gdc 10.3: 90-91 MiB/s (worse since I upgraded dmd?!)
  • ddh sha3-256 with ldc 1.20.1: 119-124 MiB/s

@dd86k
Copy link
Owner Author

dd86k commented Dec 27, 2021

New test under Windows (pv from Cygwin, supporting /dev/urandom) evaluates sha3-d at 140 MiB/s and OpenSSL-Win64 3.0.1 at 232 MiB/s so yeah I do see the difference now.

@dd86k
Copy link
Owner Author

dd86k commented Apr 9, 2022

In any case, a version Sha3dUseSIMD or Sha3dUseIntrinsics should be provided. Selecting it should be manual, feels better this way, to me at least.

@dd86k
Copy link
Owner Author

dd86k commented Dec 21, 2023

Plan:

  • Use XCKP's SSE, AVX, and AVX2 implementation variants.
  • version (D_AVX): Use AVX impl.
  • version (D_AVX2): Use AVX2 impl.

Notes:

  • Only DMD defines version D_SIMD.
    • Will have to try using the compiles trait
    • At best core.simd will adapt to using SSE or AVX somehow
  • No compilers define versions AVX and AVX2 by default.
    • DMD defines both with -mcpu=avx2
    • LDC defines both with -mattr=+avx2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant