Skip to content

Commit

Permalink
Fix RegexOptions.Compiled|IgnoreCase perf when dynamic code isn't sup…
Browse files Browse the repository at this point in the history
…ported (dotnet#107874)

If a regex is created with RegexOptions.Compiled and RegexOptions.IgnoreCase, and it begins with a pattern that's a reasonably small number of alternating strings, it'll now end up using `SearchValues<string>` to find the next possible match location. However, the `SearchValues<string>` instance doesn't end up getting created if the interpreter is being used. If the implementation falls back to the interpreter because compilation isn't supported because dynamic code isn't supported, then it won't use any optimizations to find the next starting location. That's a regression from when it would previously at least use a vectorized search to find one character class from the set of starting strings.

This fixes it to just always create the `SearchValues<string>`. This adds some overhead when using RegexOptions.Compiled, but it's typically just a few percentage points, and only applies in the cases where this `SearchValues<string>` optimization kicks in. At the moment, changing it to have perfect knowledge about whether it can avoid that creation is too invasive. This overhead also doesn't apply to the source generator.
  • Loading branch information
stephentoub authored and sirntar committed Sep 30, 2024
1 parent e1dd94e commit 9c08382
Showing 1 changed file with 4 additions and 16 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,6 @@ public RegexFindOptimizations(RegexNode root, RegexOptions options)
bool dfa = (options & RegexOptions.NonBacktracking) != 0;
bool compiled = (options & RegexOptions.Compiled) != 0 && !dfa; // for now, we never generate code for NonBacktracking, so treat it as non-compiled
bool interpreter = !compiled && !dfa;
bool usesRfoTryFind = !compiled;

// For interpreter, we want to employ optimizations, but we don't want to make construction significantly
// more expensive; someone who wants to pay to do more work can specify Compiled. So for the interpreter
Expand Down Expand Up @@ -149,10 +148,7 @@ public RegexFindOptimizations(RegexNode root, RegexOptions options)
LeadingPrefixes = caseInsensitivePrefixes;
FindMode = FindNextStartingPositionMode.LeadingStrings_OrdinalIgnoreCase_LeftToRight;
#if SYSTEM_TEXT_REGULAREXPRESSIONS
if (usesRfoTryFind)
{
LeadingStrings = SearchValues.Create(LeadingPrefixes, StringComparison.OrdinalIgnoreCase);
}
LeadingStrings = SearchValues.Create(LeadingPrefixes, StringComparison.OrdinalIgnoreCase);
#endif
return;
}
Expand All @@ -165,10 +161,7 @@ public RegexFindOptimizations(RegexNode root, RegexOptions options)
// LeadingPrefixes = caseSensitivePrefixes;
// FindMode = FindNextStartingPositionMode.LeadingStrings_LeftToRight;
#if SYSTEM_TEXT_REGULAREXPRESSIONS
// if (usesRfoTryFind)
// {
// LeadingStrings = SearchValues.Create(LeadingPrefixes, StringComparison.Ordinal);
// }
// LeadingStrings = SearchValues.Create(LeadingPrefixes, StringComparison.Ordinal);
#endif
// return;
//}
Expand Down Expand Up @@ -699,14 +692,9 @@ public bool TryFindNextStartingPositionLeftToRight(ReadOnlySpan<char> textSpan,
case FindNextStartingPositionMode.LeadingStrings_LeftToRight:
case FindNextStartingPositionMode.LeadingStrings_OrdinalIgnoreCase_LeftToRight:
{
if (LeadingStrings is not SearchValues<string> searchValues)
{
// This should be exceedingly rare and only happen if a Compiled regex selected this
// option but then failed to compile (e.g. due to too deep stacks) and fell back to the interpreter.
return true;
}
Debug.Assert(LeadingStrings is not null);

int i = textSpan.Slice(pos).IndexOfAny(searchValues);
int i = textSpan.Slice(pos).IndexOfAny(LeadingStrings);
if (i >= 0)
{
pos += i;
Expand Down

0 comments on commit 9c08382

Please sign in to comment.