-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex UpdateBumpalong logic is flawed and likely costing us perf #62676
Comments
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsWe have an optimization that inserts an "UpdateBumpalong" node into the graph after a starting char loop: runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Lines 373 to 384 in 12a8819
The code handling this node currently sets base.runtextpos to the current position, the idea being that the normal bumpalong of +1 would otherwise end up just processing the same position we've already tried. However, it looks like our current logic for handling this is flawed, in particular for non-atomic greedy loops. The current logic (in all engines) does the equivalent of: base.runtextpos = pos; but when backtracking kicks in for a greedy loop, we'll end up setting base.runtextpos to the last match position, and then again to that position -1, and then again to that position -2, all the way back to the beginning, which means it's not actually buying us anything in this case (in contrast, for atomic loops where there isn't backtracking, it will successfully track the last matching position). It seems like the logic should actually be more like: if (base.runtextpos < pos)
{
base.runtextpos = pos;
}
|
We have an optimization that inserts an "UpdateBumpalong" node into the graph after a starting char loop:
runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Lines 373 to 384 in 12a8819
The code handling this node currently sets base.runtextpos to the current position, the idea being that the normal bumpalong of +1 would otherwise end up just processing the same position we've already tried. However, it looks like our current logic for handling this is flawed, in particular for non-atomic greedy loops. The current logic (in all engines) does the equivalent of:
but when backtracking kicks in for a greedy loop, we'll end up setting base.runtextpos to the last match position, and then again to that position -1, and then again to that position -2, all the way back to the beginning, which means it's not actually buying us anything in this case (in contrast, for atomic loops where there isn't backtracking, it will successfully track the last matching position). It seems like the logic should actually be more like:
The text was updated successfully, but these errors were encountered: