[Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled) #61786

performanceautofiler · 2021-11-18T16:34:30Z

Run Information

Architecture	arm64
OS	Windows 10.0.19041
Baseline	10e107debae28de4bd1e710cfe448be3c293e841
Compare	9962c10813de695782baa20f1bf0bb2e9810971d
Diff	Diff

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Uri_IsNotMatch - Duration of single invocation	315.14 ns	557.11 ns	1.77	0.20	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*'

Payloads

Baseline
Compare

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled)

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2021-11-18T16:45:23Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

DrewScoggins · 2021-11-18T16:46:47Z

Seems related to #61490. This is the only regression we detected, but it is fairly large (80%)

DrewScoggins · 2021-11-18T16:47:37Z

@stephentoub

stephentoub · 2021-11-18T16:51:48Z

This is the only regression we detected, but it is fairly large (80%)

I'm actually pleased it's the only one detected.

I expected and accepted this one in exchange for gains elsewhere. Though looking at it again there's probably somethings we can do to improve it.

stephentoub · 2021-11-18T18:57:26Z

The issue here actually isn't what I thought it was.

The core of the issue is:

runtime/src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs

Lines 1828 to 1832 in 6ff57f1

    
           // If the two nodes don't agree on options in any way, don't try to optimize them. 
        
           if (node.Options != subsequent.Options) 
        
           { 
        
               return false; 
        
           }

This test has this sequence:

[\w]+://

Previously we'd use this CanBeMadeAtomic to check whether : can match \w, find it can't, and thus turn the \w+ loop into an atomic one. However, the use of IgnoreCase in this test is causing an issue. The cited PR optimizes the "://" to be case-sensitive, even in the presence of IgnoreCase, because none of the characters there participate in case conversion. And as a result, CanBeMadeAtomic bails early when it sees that the case-sensitivity of the two nodes differs. So the regression is because the \w+ stays non-atomic and thus causes us to backtrack more than we did previously.

The good news is the problem will just go away with #61048. There are workarounds we could put in place in the interim, but I'm tempted to say we should just prioritize fully addressing that issue asap rather than trying to patch things like this.

@joperezr, thoughts?

joperezr · 2021-11-18T19:35:21Z

Let me make sure I understand correctly. The reason why fixing issue #61048 would fix this is because now [\w]+ would now also become case-sensitive even in IgnoreCase because at construction time we would have already translated [\w] into it's upper-case lower-case equivalent (like transforming [A] => [Aa]) so in theory CanBeMadeAtomic should now return true and reduce again the number of backtracking we do, is that right? If so, I agree it is better to just wait for that to be fixed than trying to undo, or patch the optimizations made with your PR.

stephentoub · 2021-11-18T20:59:29Z

@joperezr, exactly. (For this one specific case, it's further interesting because we recognize \w as being case-sensitive even under IgnoreCase, because the categories it maps to already includes both upper and lower. But because of how the parser works, even though [\w] and \w are 100% identical sets, the case-sensitive \w gets added to the set and [\w] ends up as case-insensitive.)

ghost · 2021-11-21T20:32:08Z

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Architecture	arm64
OS	Windows 10.0.19041
Baseline	10e107debae28de4bd1e710cfe448be3c293e841
Compare	9962c10813de695782baa20f1bf0bb2e9810971d
Diff	Diff

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
Uri_IsNotMatch - Duration of single invocation	315.14 ns	557.11 ns	1.77	0.20	False

Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Text.RegularExpressions.Tests.Perf_Regex_Common*'

Payloads

Baseline
Compare

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled)

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`arch-arm64`, `area-System.Text.RegularExpressions`, `os-windows`, `tenet-performance`, `tenet-performance-benchmarks`, `untriaged`
Milestone:	-

DrewScoggins transferred this issue from dotnet/perf-autofiling-issues Nov 18, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Nov 18, 2021

DrewScoggins added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark arch-arm64 os-windows labels Nov 18, 2021

DrewScoggins changed the title ~~[Perf] Changes at 11/17/2021 8:43:26 PM~~ [Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled) Nov 18, 2021

jeffschwMSFT added the area-System.Text.RegularExpressions label Nov 21, 2021

stephentoub mentioned this issue Nov 23, 2021

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common #61972

Closed

joperezr added this to the 7.0.0 milestone Feb 2, 2022

stephentoub mentioned this issue Mar 26, 2022

Changing the logic for how we deal with RegexOptions.IgnoreCase matching. #67184

Merged

jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 28, 2022

ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 29, 2022

stephentoub closed this as completed in #67184 Apr 6, 2022

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 6, 2022

ghost locked as resolved and limited conversation to collaborators May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled) #61786

[Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled) #61786

performanceautofiler bot commented Nov 18, 2021

Payloads

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled)

Docs

dotnet-issue-labeler bot commented Nov 18, 2021

DrewScoggins commented Nov 18, 2021

DrewScoggins commented Nov 18, 2021

stephentoub commented Nov 18, 2021

stephentoub commented Nov 18, 2021

joperezr commented Nov 18, 2021 •

edited

Loading

stephentoub commented Nov 18, 2021

ghost commented Nov 21, 2021

Run Information

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Repro

Payloads

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled)

Docs

[Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled) #61786

[Perf] Regression in Perf_Regex.Uri_IsNotMatch(IgnoreCase, Compiled) #61786

Comments

performanceautofiler bot commented Nov 18, 2021

Run Information

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Repro

Payloads

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled)

Docs

dotnet-issue-labeler bot commented Nov 18, 2021

DrewScoggins commented Nov 18, 2021

DrewScoggins commented Nov 18, 2021

stephentoub commented Nov 18, 2021

stephentoub commented Nov 18, 2021

joperezr commented Nov 18, 2021 • edited Loading

stephentoub commented Nov 18, 2021

ghost commented Nov 21, 2021

Run Information

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Common

Repro

Payloads

Histogram

System.Text.RegularExpressions.Tests.Perf_Regex_Common.Uri_IsNotMatch(Options: IgnoreCase, Compiled)

Docs

joperezr commented Nov 18, 2021 •

edited

Loading