Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern matching ReadOnlySpan<char> is worse than String #66529

Closed
eerhardt opened this issue Mar 11, 2022 · 11 comments
Closed

Pattern matching ReadOnlySpan<char> is worse than String #66529

eerhardt opened this issue Mar 11, 2022 · 11 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@eerhardt
Copy link
Member

eerhardt commented Mar 11, 2022

Description

Trying to use dotnet/csharplang#1881 is resulting in worse performance than calling Span.ToString() and matching on that.

I've created a repro: https://github.com/eerhardt/SwitchSpanPerf. Just pull this code and dotnet run -c Release to run the benchmarks. I've tried both .NET 6 and the latest .NET 7 main branch.

The only difference between the two GetNamedColorXXX methods is:

    Span<char> loweredValue = value.Length <= 128 ? stackalloc char[value.Length] : new char[value.Length];

    int charsWritten = value.ToLowerInvariant(loweredValue);

-   return loweredValue switch
+   return loweredValue.ToString() switch
    {

Regression?

No

Windows .NET 6

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1586 (21H1/May2021Update)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.300-preview.22154.4
  [Host]     : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  DefaultJob : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
Method Mean Error StdDev Code Size
SpanBlack 59.00 ns 0.332 ns 0.294 ns 17,114 B
StringBlack 27.55 ns 0.183 ns 0.172 ns 14,626 B
SpanLightGoldenrodYellowk 79.84 ns 0.104 ns 0.081 ns 17,114 B
StringLightGoldenrodYellow 42.88 ns 0.398 ns 0.372 ns 14,626 B

Windows .NET 7

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1586 (21H1/May2021Update)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.300-preview.22154.4
  [Host]     : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  Job-UZCFFF : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

Toolchain=CoreRun
Method Mean Error StdDev Code Size
SpanBlack 54.22 ns 0.299 ns 0.280 ns 16,192 B
StringBlack 30.88 ns 0.266 ns 0.236 ns 13,033 B
SpanLightGoldenrodYellowk 72.58 ns 0.147 ns 0.122 ns 16,192 B
StringLightGoldenrodYellow 43.30 ns 0.579 ns 0.513 ns 13,033 B

MacOS .NET 6

Surprisingly on my MacOS x64 machine on .NET 6 doesn't seem to have this problem.

BenchmarkDotNet=v0.13.1, OS=macOS Monterey 12.2.1 (21D62) [Darwin 21.3.0]
Intel Core i7-4870HQ CPU 2.50GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT
  DefaultJob : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT
Method Mean Error StdDev
SpanBlack 34.53 ns 0.131 ns 0.109 ns
StringBlack 37.65 ns 0.488 ns 0.501 ns
SpanLightGoldenrodYellowk 59.55 ns 0.473 ns 0.443 ns
StringLightGoldenrodYellow 55.63 ns 0.268 ns 0.224 ns

Analysis

The only real differences in IL that I see are (left String, right Span):

  1. At the top where we call ToString and the ComputeHash methods:
    image

  2. Once we've found the correct hash and calling the string == method vs. AsSpan and SequenceEqual methods:
    image

Comparing the assembly code, this part looks a lot difference:

image

cc @EgorBo @stephentoub

@eerhardt eerhardt added the tenet-performance Performance related issue label Mar 11, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 11, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@eerhardt eerhardt added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 11, 2022
@ghost
Copy link

ghost commented Mar 11, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Trying to use dotnet/csharplang#1881 is resulting in worse performance than calling Span.ToString() and matching on that.

I've created a repro: https://github.com/eerhardt/SwitchSpanPerf. Just pull this code and dotnet run -c Release to run the benchmarks. I've tried both .NET 6 and the latest .NET 7 main branch.

The only difference between the two GetNamedColorXXX methods is:

    Span<char> loweredValue = value.Length <= 128 ? stackalloc char[value.Length] : new char[value.Length];

    int charsWritten = value.ToLowerInvariant(loweredValue);

-   return loweredValue switch
+   return loweredValue.ToString() switch
    {

Regression?

No

Windows .NET 6

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1586 (21H1/May2021Update)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.300-preview.22154.4
  [Host]     : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  DefaultJob : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
Method Mean Error StdDev Code Size
SpanBlack 59.00 ns 0.332 ns 0.294 ns 17,114 B
StringBlack 27.55 ns 0.183 ns 0.172 ns 14,626 B
SpanLightGoldenrodYellowk 79.84 ns 0.104 ns 0.081 ns 17,114 B
StringLightGoldenrodYellow 42.88 ns 0.398 ns 0.372 ns 14,626 B

Windows .NET 7

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1586 (21H1/May2021Update)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.300-preview.22154.4
  [Host]     : .NET 6.0.2 (6.0.222.6406), X64 RyuJIT
  Job-UZCFFF : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

Toolchain=CoreRun
Method Mean Error StdDev Code Size
SpanBlack 54.22 ns 0.299 ns 0.280 ns 16,192 B
StringBlack 30.88 ns 0.266 ns 0.236 ns 13,033 B
SpanLightGoldenrodYellowk 72.58 ns 0.147 ns 0.122 ns 16,192 B
StringLightGoldenrodYellow 43.30 ns 0.579 ns 0.513 ns 13,033 B

MacOS .NET 6

Surprisingly on my MacOS x64 machine on .NET 6 doesn't seem to have this problem.

BenchmarkDotNet=v0.13.1, OS=macOS Monterey 12.2.1 (21D62) [Darwin 21.3.0]
Intel Core i7-4870HQ CPU 2.50GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.200
  [Host]     : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT
  DefaultJob : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT
Method Mean Error StdDev
SpanBlack 34.53 ns 0.131 ns 0.109 ns
StringBlack 37.65 ns 0.488 ns 0.501 ns
SpanLightGoldenrodYellowk 59.55 ns 0.473 ns 0.443 ns
StringLightGoldenrodYellow 55.63 ns 0.268 ns 0.224 ns

Analysis

The only real differences in IL that I see are (left String, right Span):

  1. At the top where we call ToString and the ComputeHash methods:
    image

  2. Once we've found the correct hash and calling the string == method vs. AsSpan and SequenceEqual methods:
    image

Comparing the assembly code, this part looks a lot difference:

image

cc @EgorBo @stephentoub

Author: eerhardt
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Mar 12, 2022

This could greatly be improved by #65288 but it easily hits the "<20 basic blocks" heuristics - I'll see what will happen if I remove that.

@EgorBo
Copy link
Member

EgorBo commented Mar 12, 2022

So I've looked at the codegen and it seems to be the main issue here that the IL is just too big - JIT gives up on inlining and register allocation (tracking locals) so we end up with very inefficient codegen for loweredValue.SequenceEqual("<literal>".AsSpan()) like the one you noticed.

I feel like enabling #65288 unconditionally for at least spans is a good idea

@eerhardt
Copy link
Member Author

@EgorBo - do you know why I was seeing this difference on my windows machine but not my mac?

@EgorBo
Copy link
Member

EgorBo commented Mar 12, 2022

@EgorBo - do you know why I was seeing this difference on my windows machine but not my mac?

I suspect it's ABI difference. ReadOnlySpan<char> can be returned/passed via registers on SysV ABI (macos-x64/linux-x64) and via stack on windows-x64

@EgorBo
Copy link
Member

EgorBo commented Mar 12, 2022

Example:
image

Left - Windows-x64, Right - Linux-x64

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 13, 2022

A huge switch like that feels like there should be a better way to match it. A Trie or something like that.

@EgorBo
Copy link
Member

EgorBo commented Mar 13, 2022

A huge switch like that feels like there should be a better way to match it. A Trie or something like that.

Feel free to implement dotnet/roslyn#56374 on the Roslyn side ;-)

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 13, 2022

Hasn't been approved, plus I'm not sure I need to be in yet another codebase...
If it gets approved I could give it a try.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 14, 2022
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Mar 14, 2022
@EgorBo
Copy link
Member

EgorBo commented Mar 14, 2022

Closed via #66534

@EgorBo EgorBo closed this as completed Mar 14, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

4 participants