-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Or, And, Not + related fixes #69839
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsThis PR includes cleanup of unused nodes and related fixes. The
|
src/libraries/System.Text.RegularExpressions/src/System.Text.RegularExpressions.csproj
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.Debug.cs
Show resolved
Hide resolved
@@ -29,23 +29,11 @@ internal DfaMatchingState(SymbolicRegexNode<TSet> node, uint prevCharKind) | |||
internal bool IsDeadend => Node.IsNothing; | |||
|
|||
/// <summary>The node must be nullable here</summary> | |||
internal int FixedLength | |||
internal int FixedLength(uint nextCharKind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great that you were able to fix the fixed-length markers. Do we know if this was contributing to some of the perf slowdowns that had been measured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely, but I don't for sure know yet. I'll measure against current main after I get this merged (to unblock Margus).
....RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.Dgml.cs
Outdated
Show resolved
Hide resolved
....RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.Dgml.cs
Outdated
Show resolved
Hide resolved
....RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.Dgml.cs
Outdated
Show resolved
Hide resolved
....RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.Dgml.cs
Outdated
Show resolved
Hide resolved
...gularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.Explore.cs
Show resolved
Hide resolved
...gularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.Explore.cs
Show resolved
Hide resolved
...gularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.Explore.cs
Outdated
Show resolved
Hide resolved
....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs
Outdated
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Outdated
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Outdated
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Outdated
Show resolved
Hide resolved
...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Reworked DGML support and sampling to work without the old Or. The exploration step that was in the DGML code has been pulled out now. With Not gone the sampler has some direct support for negative sampling, but it's still buggy and complicated to do properly. Will likely remove. Removed TransitionRegex, which needed And and Not.
Now the fixed length logic supports conditional nullability too.
@olsaarik - we found the following regressions that seemed to line up with this PR and as per the issues referenced by @AndyAyersMS. We detected this from our analysis while creating the perf report for August. Would you consider these regressions as "by design"? We did notice an improvement, however, we are still not at the same level as before this change: CC: @dakersnar EDIT: Removed the benchmarks involving the Compiled parameter such as
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_BoostDocs_Simple.IsMatch(Id: 11, Options: NonBacktracking)
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_BoostDocs_Simple.IsMatch(Id: 0, Options: NonBacktracking)
|
(While this could have affected the two NonBacktracking tests, this would not have affected the Compiled test.) |
This PR includes cleanup of unused nodes and related fixes. The
SymbolicRegexNodeKind
elementsOr
,And
andNot
as well as any related classes/functions have been removed. This includesSymbolicRegexSet
,TransitionRegex
andSymbolicNFA
.OrderedOr
has been renamed toAlternate
, to match the name of the parse tree node. Their semantics are the same.SaveDGML
, which depended onOr
, has been reworked by pulling out the exploration logic intoSymbolicRegexMatcher.Explore
, which does the exploration in the main transition structures of the matcher. This means that the exploration can also be used to pre-calculate all the derivatives instead of doing it on the fly, which might be useful for future optimizations (e.g. combiningCompiled
andNonBacktracking
).GenerateRandomMembers
is nowSampleMatches
and has also been rewritten to use the transition logic fromSymbolicRegexMatcher
. Negative sampling has been removed due to not having support forNot
anymore.SymbolicRegexMatcher
is now a partial class andSaveDGML
,Explore
, andSampleMatches
are implemented in separate files as additions to the matcher.FixedLengthMarker
support had several problems that this PR fixes:SymbolicRegexNode.ResolveFixedLength
. The new function supports finding the fixed length marker on the path that the backtracking matcher would accept a match and also supports conditional nullability. For example, patterns likeabc|$(4)|(3)
would correctly resolve to either 4 or 3 depending on the context.RegexNodeConverter
didn't support the structures actually present now that we have capture start and end markers. The logic is now in a separate transformation functionSymbolicRegexNode.AddFixedLengthMarkers
, which made it much easier to handle the various cases. It does introduce some additional allocations, as parts of the tree may be rebuilt as markers are added.