-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SpanHelpers.IndexOfAny throughput / vectorization #25023
Comments
From initial glance, we never hit the vectorized path for less than 64 bytes on most modern machines (32 bytes on older machines without AVX/AVX2 support or for ARM machines). I imagine the biggest benefit would be just handling the initial leading/trailing elements as unaligned (doing at most 2 unaligned iterations and the remaining aligned) which would avoid the small 1-4 iteration loops with many branches (the unaligned vector operation should be faster even if the load happens to cross a cache line boundary; since it will cut down the 8 comparisons/branches and the loop). Other than that, we'd likely get some benefit from switching to use Hardware Intrinsics (rather than |
Also CC. @GrabYourPitchforks |
@stephentoub @tannergooding I would like to give this a shot. Seems like a good opportunity to sharpen my intrinsics skills. Focus seems to be on improving perf for less than 64 bytes, hence plan could be:
|
Have 4 PRs for these that I've not moved over as they had been open for about a year
So was waiting for other of my intrinsic changes to be merged before bothering e.g. #32371 |
@benaadams ha of course you do! 👍 I'll let you finish then. Let me know if you need any help or something :) |
@GrabYourPitchforks seems there's a PR dependency chain that includes #32371 -- could you help shuffle it along? |
@benaadams were you still going to give this a try? |
|
Our regex engine can now spend a decent amount of time inside of span helpers like:
runtime/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Char.cs
Line 467 in 2355a10
and in particular, we seem to hit this path:
runtime/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Char.cs
Lines 492 to 499 in 2355a10
fairly frequently. It'd be great to investigate whether we can do anything to improve the performance of these IndexOfAny helpers, whether it's by improving how we do the vectorization, or utilizing intrinsics directly if that would help, etc. I believe @tannergooding had some ideas.
For example, we spend ~30% of the time in the regex redux benchmark in this helper:
cc: @danmosemsft
The text was updated successfully, but these errors were encountered: