-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize System.Numerics.BitOperations using arm64 intrinsics #33495
Comments
I was trying to convert
However since “cnt” only operates on byte/sbyte, I need to use Any other suggestion that I can try to optimize PopCount() using arm64 intrinsic or should we stick to software fallback for now? @TamarChristinaArm @tannergooding @echesakovMSFT |
Hm.. here is what clang emits for |
@kunalspathak yeah unfortunately we do only have popcount on vector registers, but your sequence is correct and should be much faster than the software fallbacks. It looks like the reason it's not is the surrounding code to create and extract the element. Your second example
should have been converted into a simple where currently you get
That alone looks like it's more expensive than the software fallback. It looks like the JIT doesn't know how to move values between register files if not through memory.
This should have been an |
Perhaps instead of
this works better for the final bits
|
This is tracked by #33496 and should likely be resolved before moving forward on the other items so we don't accidentally skew benchmark numbers due to inefficient create methods. |
Thanks @TamarChristinaArm for the insights. Yes, |
Thanks for reminding me of godbolt @EgorBo . I keep forgetting that I can verify the code generated by others. :) |
This item tracks the conversion of the System.Numerics.BitOperations class to use arm64 intrinsics.
Related: #33308
The text was updated successfully, but these errors were encountered: