-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix RegexOptions.Compiled|IgnoreCase perf when dynamic code isn't supported #107874
Conversation
…ported If a regex is created with RegexOptions.Compiled and RegexOptions.IgnoreCase, and it begins with a pattern that's a reasonably small number of alternating strings, it'll now end up using `SearchValues<string>` to find the next possible match location. However, the `SearchValues<string>` instance doesn't end up getting created if the interpreter is being used. If the implementation falls back to the interpreter because compilation isn't supported because dynamic code isn't supported, then it won't use any optimizations to find the next starting location. That's a regression from when it would previously at least use a vectorized search to find one character class from the set of starting strings. This fixes it to just always create the `SearchValues<string>`. This adds some overhead when using RegexOptions.Compiled, but it's typically just a few percentage points, and only applies in the cases where this `SearchValues<string>` optimization kicks in. At the moment, changing it to have perfect knowledge about whether it can avoid that creation is too invasive. This overhead also doesn't apply to the source generator.
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the clear description.
@BrzVlad, this makes the regression go away for the cited case of IsDynamicCodeSupported being false with coreclr (rather than regressing, .NET 8 to .NET 9 improves by almost ~40% for me on the offending benchmark). Do you similarly see the same for mono? I want to make sure there aren't two issues lurking. |
/backport to release/9.0 |
Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10888162147 |
@stephentoub Thanks for the quick fix. The huge regression I was observing indeed goes away. There seems to be a regression of only 1.75x compared to .NET8 on mono interpreter. 85% of the time is now spent running the method |
I'd defer to @MihaZupan for that, but I suspect it's unlikely. |
@BrzVlad would you be able to test the performance with this patch applied as well? If that doesn't help much, I think the next best thing would be reviving #92680, but that seems like a .NET 10 change to me. |
@MihaZupan That patch reduced the regression from 1.75x to 1.6x compared to .NET8. I think it is fine to not attempt to fix this for .NET9 |
/ba-g the stuck leg is hitting dotnet/dnceng#3879 all failures are known |
…ported (dotnet#107874) If a regex is created with RegexOptions.Compiled and RegexOptions.IgnoreCase, and it begins with a pattern that's a reasonably small number of alternating strings, it'll now end up using `SearchValues<string>` to find the next possible match location. However, the `SearchValues<string>` instance doesn't end up getting created if the interpreter is being used. If the implementation falls back to the interpreter because compilation isn't supported because dynamic code isn't supported, then it won't use any optimizations to find the next starting location. That's a regression from when it would previously at least use a vectorized search to find one character class from the set of starting strings. This fixes it to just always create the `SearchValues<string>`. This adds some overhead when using RegexOptions.Compiled, but it's typically just a few percentage points, and only applies in the cases where this `SearchValues<string>` optimization kicks in. At the moment, changing it to have perfect knowledge about whether it can avoid that creation is too invasive. This overhead also doesn't apply to the source generator.
…ported (dotnet#107874) If a regex is created with RegexOptions.Compiled and RegexOptions.IgnoreCase, and it begins with a pattern that's a reasonably small number of alternating strings, it'll now end up using `SearchValues<string>` to find the next possible match location. However, the `SearchValues<string>` instance doesn't end up getting created if the interpreter is being used. If the implementation falls back to the interpreter because compilation isn't supported because dynamic code isn't supported, then it won't use any optimizations to find the next starting location. That's a regression from when it would previously at least use a vectorized search to find one character class from the set of starting strings. This fixes it to just always create the `SearchValues<string>`. This adds some overhead when using RegexOptions.Compiled, but it's typically just a few percentage points, and only applies in the cases where this `SearchValues<string>` optimization kicks in. At the moment, changing it to have perfect knowledge about whether it can avoid that creation is too invasive. This overhead also doesn't apply to the source generator.
If a regex is created with RegexOptions.Compiled and RegexOptions.IgnoreCase, and it begins with a pattern that's a reasonably small number of alternating strings, it'll now end up using
SearchValues<string>
to find the next possible match location. However, theSearchValues<string>
instance doesn't end up getting created if the interpreter is being used. If the implementation falls back to the interpreter because compilation isn't supported because dynamic code isn't supported, then it won't use any optimizations to find the next starting location. That's a regression from when it would previously at least use a vectorized search to find one character class from the set of starting strings.This fixes it to just always create the
SearchValues<string>
. This adds some overhead when using RegexOptions.Compiled, but it's typically just a few percentage points, and only applies in the cases where thisSearchValues<string>
optimization kicks in. At the moment, changing it to have perfect knowledge about whether it can avoid that creation is too invasive. This overhead also doesn't apply to the source generator.Contributes to #99553 (this should be backported to release/9.0)