Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regressions in System.Globalization.Tests.StringSearch #91938

Open
performanceautofiler bot opened this issue Sep 5, 2023 · 10 comments
Open

Regressions in System.Globalization.Tests.StringSearch #91938

performanceautofiler bot opened this issue Sep 5, 2023 · 10 comments
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows PGO Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Milestone

Comments

@performanceautofiler
Copy link

performanceautofiler bot commented Sep 5, 2023

Run Information

Name Value
Architecture x64
OS Windows 10.0.19042
Queue OwlWindows
Baseline 538cd03b39afb57eb17ea77a86b3c94c039b8822
Compare c7d36b48a493789c9267069645e2690e804a2067
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringSearch

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
397.12 ns 516.40 ns 1.30 0.03 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*'

Payloads

Baseline
Compare

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False))

ETL Files

Histogram

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 516.3989242627822 > 416.48485541506614.
IsChangePoint: Marked as a change because one of 6/22/2023 7:55:14 AM, 8/27/2023 2:42:29 PM, 9/4/2023 9:28:57 AM falls between 8/26/2023 7:53:22 PM and 9/4/2023 9:28:57 AM.
IsRegressionStdDev: Marked as regression because -523.7204883102902 (T) = (0 -516.5964530091254) / Math.Sqrt((0.39932907677050516 / (9)) + (0.19792037708236598 / (24))) is less than -2.0395134463927462 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (9) + (24) - 2, .025) and -0.30301021705518705 = (396.4638544252073 - 516.5964530091254) / 396.4638544252073 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@performanceautofiler performanceautofiler bot added arch-x64 os-windows PGO runtime-coreclr specific to the CoreCLR runtime untriaged New issue has not been triaged by the area owner labels Sep 5, 2023
@cincuranet cincuranet removed the untriaged New issue has not been triaged by the area owner label Sep 5, 2023
@cincuranet
Copy link
Contributor

Windows x64 Intel and AMD both are stable-regressed. Transfering.

@cincuranet cincuranet transferred this issue from dotnet/perf-autofiling-issues Sep 12, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 12, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 12, 2023
@cincuranet cincuranet added tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark labels Sep 12, 2023
@cincuranet cincuranet changed the title [Perf] Windows/x64: 1 Regression on 8/28/2023 8:13:19 AM Regressions in System.Globalization.Tests.StringSearch Sep 12, 2023
@jeffschwMSFT jeffschwMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 13, 2023
@ghost
Copy link

ghost commented Sep 13, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Run Information

Name Value
Architecture x64
OS Windows 10.0.19042
Queue OwlWindows
Baseline 538cd03b39afb57eb17ea77a86b3c94c039b8822
Compare c7d36b48a493789c9267069645e2690e804a2067
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringSearch

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
397.12 ns 516.40 ns 1.30 0.03 False

graph
Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*'

Payloads

Baseline
Compare

System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, OrdinalIgnoreCase, False))

ETL Files

Histogram

Description of detection logic

IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small.
IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline.
IsRegressionWindowed: Marked as regression because 516.3989242627822 > 416.48485541506614.
IsChangePoint: Marked as a change because one of 6/22/2023 7:55:14 AM, 8/27/2023 2:42:29 PM, 9/4/2023 9:28:57 AM falls between 8/26/2023 7:53:22 PM and 9/4/2023 9:28:57 AM.
IsRegressionStdDev: Marked as regression because -523.7204883102902 (T) = (0 -516.5964530091254) / Math.Sqrt((0.39932907677050516 / (9)) + (0.19792037708236598 / (24))) is less than -2.0395134463927462 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (9) + (24) - 2, .025) and -0.30301021705518705 = (396.4638544252073 - 516.5964530091254) / 396.4638544252073 is less than -0.05.
IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small.
IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so.

JIT Disasms

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

Author: performanceautofiler[bot]
Assignees: -
Labels:

os-windows, tenet-performance, tenet-performance-benchmarks, arch-x64, area-CodeGen-coreclr, untriaged, runtime-coreclr, PGO, needs-area-label

Milestone: -

@BruceForstall BruceForstall added this to the 9.0.0 milestone Sep 18, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 18, 2023
@BruceForstall
Copy link
Member

@AndyAyersMS Can you investigate?

@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 18, 2023
@AndyAyersMS AndyAyersMS self-assigned this Sep 18, 2023
@AndyAyersMS
Copy link
Member

Looks like a fairly broad regression (Aug-28)

image

@AndyAyersMS AndyAyersMS added the Priority:2 Work that is important, but not critical for the release label May 8, 2024
@AndyAyersMS
Copy link
Member

Still seems regressed overall... lots of fluctuations but higher than it was last year

image

@AndyAyersMS
Copy link
Member

Commit range is f5881e5...a7d4b4b

Seems like a static PGO update lead to this one.

@AndyAyersMS
Copy link
Member

At least locally this improved and then re-regressed

Method Runtime Options Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
LastIndexOf_Word_NotFound .NET 7.0 (en-US, OrdinalIgnoreCase, False) 624.0 ns 20.22 ns 23.28 ns 616.5 ns 599.5 ns 691.3 ns 1.00 0.05 - NA
LastIndexOf_Word_NotFound .NET 8.0 (en-US, OrdinalIgnoreCase, False) 781.5 ns 33.94 ns 39.08 ns 774.1 ns 732.8 ns 894.0 ns 1.25 0.08 - NA
LastIndexOf_Word_NotFound .NET 9.0p5 (en-US, OrdinalIgnoreCase, False) 616.5 ns 37.83 ns 43.56 ns 596.0 ns 581.7 ns 720.6 ns 0.99 0.08 - NA
LastIndexOf_Word_NotFound .NET 9.0p6 (en-US, OrdinalIgnoreCase, False) 725.0 ns 5.01 ns 4.69 ns 725.0 ns 717.9 ns 735.7 ns 1.23 0.02 - NA
7.0
Benchmark: found 20 intervals; mean interval 220.592ms

00.02%   1E+04       ?        Unknown
95.48%   4.25E+07    Tier-1   [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
01.03%   4.6E+05     Tier-1   [System.Private.CoreLib]CompareInfo.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>,value class System.Globalization.CompareOptions)
00.85%   3.8E+05     Tier-1   [System.Private.CoreLib]Ordinal.LastIndexOfOrdinalIgnoreCase(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
00.65%   2.9E+05     Tier-1   [MicroBenchmarks]StringSearch.LastIndexOf_Word_NotFound()
00.61%   2.7E+05     Tier-1   [5c35a557-7919-4c35-a009-c8e8c4d39c9c]Runnable_0.WorkloadActionUnroll(int64)

9.0 preview 6
Benchmark: found 20 intervals; mean interval 296.440ms

96.75%   5.811E+07   Tier-1   [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>)
00.80%   4.8E+05     Tier-1   [MicroBenchmarks]StringSearch.LastIndexOf_Word_NotFound()
00.73%   4.4E+05     native   clrjit.dll
00.70%   4.2E+05     native   coreclr.dll
00.35%   2.1E+05     Tier-1   [System.Private.CoreLib]CompareInfo.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>,value class System.Globalization.CompareOptions)

@AndyAyersMS
Copy link
Member

Need to do more digging but certainly looks like a layout issue. Raw sample attribution

NET 7
Raw samples for [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>) at 0x00007FF8ACAFF460 -- 0x00007FF8ACAFF783 (length 0x0323)
0x0000 : 1
0x0008 : 3
0x000C : 1
0x0017 : 2
0x0025 : 3
0x002F : 1
0x0037 : 2
0x003C : 1
0x0043 : 1
0x004C : 417
0x005B : 11
0x005F : 387
0x0062 : 1
0x0068 : 2
0x0079 : 8
0x007D : 432
0x0086 : 4
0x0089 : 1
0x008C : 5
0x008E : 395
0x0098 : 7
0x009B : 12
0x00A5 : 761
0x00A8 : 42
0x00AE : 1
0x0107 : 1
0x010B : 367
0x010E : 47
0x0111 : 3
0x0113 : 2
0x011D : 390
0x0120 : 50
0x012A : 352
0x0133 : 4
0x0188 : 43
0x0191 : 23
0x0195 : 17
0x02BC : 3
0x02C5 : 407
0x02D7 : 7
0x02DC : 15
0x02E1 : 15
0x02E6 : 2
0x02E8 : 1


NET 9

Raw samples for [System.Private.CoreLib]OrdinalCasing.LastIndexOf(value class System.ReadOnlySpan`1<wchar>,value class System.ReadOnlySpan`1<wchar>) at 0x00007FF8CB941640 -- 0x00007FF8CB941BA1 (length 0x0561)
0x0009 : 9
0x000A : 1
0x000C : 3
0x0016 : 3
0x001E : 1
0x0022 : 1
0x004A : 1
0x005E : 1
0x0061 : 1
0x006C : 1
0x008B : 399
0x0097 : 5
0x009C : 8
0x00A2 : 409
0x00AE : 3
0x00B3 : 1
0x00BB : 2
0x00BE : 368
0x00C1 : 1
0x00C9 : 4
0x01DC : 2
0x01E1 : 14
0x01E5 : 22
0x01E8 : 390
0x01EE : 10
0x0305 : 3
0x030E : 418
0x0329 : 10
0x0331 : 14
0x0333 : 1
0x0335 : 2
0x033B : 5
0x034C : 419
0x0359 : 12
0x035C : 1
0x0367 : 387
0x036F : 36
0x0372 : 6
0x0375 : 446
0x0380 : 2
0x038A : 397
0x038D : 1
0x0391 : 31
0x039E : 385
0x03A1 : 29
0x03A7 : 1529
0x03AC : 7
0x03B0 : 10

Note in particular the heavy sample count at the end of the .NET 9 code -- seems like a hot loop has gotten widely spread out.

This is an R2R'd method so went through Tier-1 instr and some some of its callees either will lack PGO or will have static PGO.

If we disable R2R we see 9.0 do pretty well..

OutlierMode=DontRemove EnvironmentVariables=DOTNET_ReadyToRun=0 PowerPlanMode=00000000-0000-0000-0000-000000000000
IterationTime=250ms MaxIterationCount=20 MemoryRandomization=True
MinIterationCount=15 WarmupCount=1

Method Job Toolchain Options Mean Error StdDev Median Min Max Ratio RatioSD Allocated Alloc Ratio
LastIndexOf_Word_NotFound Job-GPYIFC .NET 7 (en-US, OrdinalIgnoreCase, False) 539.9 ns 9.25 ns 8.65 ns 539.2 ns 527.4 ns 555.1 ns 1.00 0.02 - NA
LastIndexOf_Word_NotFound Job-VPXAWQ .NET 9 (main) (en-US, OrdinalIgnoreCase, False) 481.5 ns 11.01 ns 12.68 ns 477.1 ns 467.9 ns 527.8 ns 0.89 0.03 - NA

So part of the issue here is lost profile data from Tier-1 instr (in particular we lose dynamic pgo for inlinees). The "fix" for this is to instrument the inlinees as well.

@AndyAyersMS
Copy link
Member

Seems like the key missing info is from System.Globalization.OrdinalCasing:ToUpper(ushort):ushort and when we synthesize we add weight to paths that aren't reached in the benchmark:
image

@AndyAyersMS
Copy link
Member

I don't think there is an easy fix here. Because of our strategy for instrumenting R2R'd code we will miss some important details about behavior of inlinees.

#44372 tracks gathering profile data for inlinees when we are profiling and optimizing.

@AndyAyersMS AndyAyersMS removed this from the 9.0.0 milestone Jul 29, 2024
@AndyAyersMS AndyAyersMS added this to the 10.0.0 milestone Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-windows PGO Priority:2 Work that is important, but not critical for the release runtime-coreclr specific to the CoreCLR runtime tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark
Projects
None yet
Development

No branches or pull requests

5 participants