-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand RegexNode atomicity test to loops at the end of alternation branches #1769
Conversation
…anches Given an expression like `(abcd*|efgh*)ijk`, we will now see that if the `i` after the alternation doesn't match and we backtrack into the alternation, we won't need to backtrack into the `d*`, as it can't give back anything that would allow the `i` to match.
} | ||
|
||
return false; | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside, I noticed on line 832, this
// If this node is a one/notone/setloop, see if it overlaps with its successor in the concatenation.
// If it doesn't, then we can upgrade it to being a one/notone/setloopatomic.
maybe should be clarified to say
// If this node is a oneloop/notoneloop/setloop, see if it overlaps with its successor in the concatenation.
// If it doesn't, then we can upgrade it to being a oneloopatomic/notoneloopatomic/setloopatomic.
since one and notone are node types that are distinct to oneloop/notoneloop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, it's everywhere eg
/// Simple optimization. If an atomic subexpression contains only a one/notone/set loop,
/// change it to be an atomic one/notone/set loop and remove the atomic node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I can see how it might not be clear; in my head I see the grouping, e.g. as if it were {one/notone/set}loop, but I can clean it up. I'll do so in my next PR.
// skip Groups, as they should have already been reduced away. | ||
// If there's a concatenation, we can jump to the last element of it. | ||
while (node.Type == Capture || node.Type == Concatenate) | ||
static void ProcessNode(RegexNode node, RegexNode subsequent, int maxDepth = 20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, might be better to pass in 20 above rather than default it here, so any future calls don't forget to decide what to pass in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for CanBeMadeAtomic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll address that subsequently.
Given an expression like
(abcd*|efgh*)ijk
, we will now see that if thei
after the alternation doesn't match and we backtrack into the alternation, we won't need to backtrack into thed*
orh*
, as it can't give back anything that would allow thei
to match.Best reviewed with whitespace disabled: https://github.com/dotnet/runtime/pull/1769/files?w=1
Contributes to #1349
cc: @danmosemsft, @pgovind, @eerhardt, @ViktorHofer, @lpereira