-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64 GC: Use SVE when sorting the mark list #108473
Comments
Tagging subscribers to this area: @dotnet/gc |
@a74nh I spent some time earlier to look into the possibility of speeding up sorting on ARM64. I've got mixed feedback on this one. On one hand, looking at here, it appears to me that all we needed is to implement these methods for a machine, then we can get vxsort working for it. Optimistically, this is probably the easiest path forward. On the other hand, speaking with @Maoni0, she told me that she talk with some other experts on ARM64 and told me that it was blocking on some factors (that I can't remember) so that we didn't do it earlier. @kunalspathak told me that we can't have the 512 bits parallelization as in here, still some vectorization should be good. I didn't go any further with this - but generally I think this is a good thing to do. In any case, we can't use C# inside the GC. Let us know if you are interested in contributing. |
Agreed, that seems a sensible place to start. This assumes that the AVX and SVE algorithms are directly compatible.
This could be the availability of SVE ? It's a fairly new technology, so would have to fall back to the other version on older hardware. Which means we need a check at runtime.
Yes, most machines with SVE are 128bits. But as hardware improves and vector lengths get longer we'll get a speed up for free.
Agreed. I only provided it in C# because I already had it available (I'm using it in a blog I'm writing about SVE in C#).
It is possible someone in my team would be able to do this. We'll have to prioritise it against other work. |
Related: #64164 @cshung - Do we know the speed up we get on x64 when trying vxsort vs. introsort. Is there a benchmark that we can use to see if porting vxsort to NEON will benefit?
Will be good to know that. |
Background
Suggestion
Alternative Suggestion
Reference
Here is an example partition routine implemented in C# SVE. All elements less than the first element get written to
left
, all other elements get written toright
The text was updated successfully, but these errors were encountered: