Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Optimize "constant_string".Length #1378

Merged
merged 23 commits into from
Feb 14, 2020

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jan 7, 2020

Resurrects dotnet/coreclr#26000
Example (showcases why it's better to have this opt in JIT rather than in Roslyn)

bool Validate(string str)
{
    return str.Length > 0 && str.Length <= 100;
}

bool Test()
{
    return Validate("Hello");  // Validate() will be inlined
}

Current codegen for Test method:

G_M17178_IG01:
G_M17178_IG02:
       mov      rax, 0xD1FFAB1E
       mov      rax, gword ptr [rax]
       cmp      dword ptr [rax+8], 0
       jle      SHORT G_M17178_IG04
G_M17178_IG03:
       cmp      dword ptr [rax+8], 100
       setle    al
       movzx    rax, al
       jmp      SHORT G_M17178_IG05
G_M17178_IG04:
       xor      eax, eax
G_M17178_IG05:
       ret

New codegen for Test method:

G_M17178_IG01:
G_M17178_IG02:
       mov      eax, 1     ; return true
G_M17178_IG03:
       ret  

The optimization is now implemented in importer.cpp instead of morph.cpp (according to feedback). Also, didn't forget about crossgen2.

Jit-diff (see report with --count 1000 here):

Total bytes of diff: -30053 (-0.07% of base)
    diff is an improvement.

Top file regressions by size (bytes):
           6 : System.Private.Xml.Linq.dasm (0.00% of base)
           4 : System.Diagnostics.TextWriterTraceListener.dasm (0.03% of base)

Top file improvements by size (bytes):
       -5917 : System.Private.CoreLib.dasm (-0.11% of base)
       -4697 : System.Private.Xml.dasm (-0.13% of base)
       -4370 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.14% of base)
       -3465 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.06% of base)
       -1987 : System.Net.HttpListener.dasm (-0.84% of base)

36 total files with size differences (34 improved, 2 regressed), 73 unchanged.

Top method regressions by size (bytes):
          70 ( 3.82% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.ActivityComputer:ResolveWellKnownSymbols():this
          60 ( 4.63% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Binder:BindIntegralMinValConstants(Microsoft.CodeAnalysis.CSharp.Syntax.PrefixUnaryExpressionSyntax,Microsoft.CodeAnalysis.CSharp.BoundExpression,Microsoft.CodeAnalysis.DiagnosticBag):Microsoft.CodeAnalysis.CSharp.BoundLiteral:this
          20 ( 2.40% of base) : System.Private.Xml.dasm - System.Xml.HtmlEncodedRawTextWriter:WriteDocType(System.String,System.String,System.String,System.String):this
          16 ( 2.02% of base) : System.Private.Xml.dasm - System.Xml.HtmlUtf8RawTextWriter:WriteDocType(System.String,System.String,System.String,System.String):this
          16 ( 2.41% of base) : System.Private.Xml.dasm - System.Xml.XmlEncodedRawTextWriter:WriteDocType(System.String,System.String,System.String,System.String):this

Top method improvements by size (bytes):
        -829 (-29.31% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.Kernel.MemoryProcessMemInfoTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
        -812 (-32.15% of base) : System.Net.Http.dasm - System.Net.Http.MultiProxy:TryParseProxyConfigPart(System.ReadOnlySpan`1[Char],bool,byref,byref):bool
        -791 (-28.77% of base) : System.Private.Xml.dasm - System.Xml.XmlConvert:EncodeName(System.String,bool,bool):System.String
        -784 (-17.38% of base) : System.Diagnostics.Process.dasm - System.Diagnostics.NtProcessManager:GetProcessInfos(System.Diagnostics.PerformanceCounterLib,int,int,System.ReadOnlySpan`1[Byte]):System.Diagnostics.ProcessInfo[]
        -780 (-39.14% of base) : System.Private.CoreLib.dasm - System.Enum:ValueToHexString(System.Object):System.String

Top method regressions by size (percentage):
          60 ( 4.63% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Binder:BindIntegralMinValConstants(Microsoft.CodeAnalysis.CSharp.Syntax.PrefixUnaryExpressionSyntax,Microsoft.CodeAnalysis.CSharp.BoundExpression,Microsoft.CodeAnalysis.DiagnosticBag):Microsoft.CodeAnalysis.CSharp.BoundLiteral:this
          70 ( 3.82% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.ActivityComputer:ResolveWellKnownSymbols():this
          16 ( 2.41% of base) : System.Private.Xml.dasm - System.Xml.XmlEncodedRawTextWriter:WriteDocType(System.String,System.String,System.String,System.String):this
          20 ( 2.40% of base) : System.Private.Xml.dasm - System.Xml.HtmlEncodedRawTextWriter:WriteDocType(System.String,System.String,System.String,System.String):this
          16 ( 2.02% of base) : System.Private.Xml.dasm - System.Xml.HtmlUtf8RawTextWriter:WriteDocType(System.String,System.String,System.String,System.String):this

Top method improvements by size (percentage):
         -14 (-70.00% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.Cci.PeWriter:get_SizeOfNameTable():int
         -14 (-70.00% of base) : System.Reflection.Metadata.dasm - System.Reflection.PortableExecutable.ManagedTextSection:get_SizeOfNameTable():int
         -85 (-49.71% of base) : System.Private.CoreLib.dasm - System.Char8:ToString():System.String:this
         -86 (-47.51% of base) : System.Diagnostics.FileVersionInfo.dasm - System.Diagnostics.FileVersionInfo:ConvertTo8DigitHex(int):System.String
         -92 (-45.10% of base) : System.Private.Xml.dasm - System.Xml.XmlTextEncoder:WriteCharEntityImpl(ushort):this

In some rare cases this opt can brake bound check elimination over string literals but it doesn't work well even today (see #1343).

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 7, 2020
@EgorBo
Copy link
Member Author

EgorBo commented Jan 7, 2020

hm.. "regression" example:

static StringBuilder Tess()
{
    return new StringBuilder("hello");
}

Diff (new codegen is on the right): https://www.diffchecker.com/QzLuLcFj

@am11
Copy link
Member

am11 commented Jan 8, 2020

Diff (new codegen is on the right): https://www.diffchecker.com/QzLuLcFj

-	       mov      r9d, dword ptr [rcx+8]	
-							;; bbWeight=0.25 PerfScore 0.50	
+	       mov      r9d, 5
+							;; bbWeight=0.25 PerfScore 0.06
...
-	; Total bytes of code: 81
+	; Total bytes of code: 83

this is because immediate value for mov is always 32 bit (and there is no 8 bit counterpart, imm8), so the instruction takes 6 bytes. otoh, dword ptr [rcx+N] uses byte for all N 0:127 (well, nasm take it to 128 and uses 4 bytes starting from 129, but masm's limit is 127).

if it instead emits:

push	5
pop	r9 

the diff will go away (for same range 0:127 with both masm and nasm).

@AndyAyersMS
Copy link
Member

@am11 On x64 (windows, at least) we generally can't change RSP in the middle of a method, so using push is problematic.

@am11
Copy link
Member

am11 commented Jan 8, 2020

Other option was to use r9b (lower 8-bits) that also consumes 1 byte (instead of 4), but does not clear upper 56 bits (as r9d clears upper 32 bits), which could be problematic as well (depends on how r9 gets consumed?)

@AndyAyersMS
Copy link
Member

379 total methods with size differences (352 improved, 27 regressed), 261676 unchanged.

I wonder how many of these come from inlining? The jit inliner currently doesn't consider a constant string passed as an argument to be noteworthy. Perhaps it should (or should, if that arg ends up in a bounds check...).

"regression" example:

Is this representative of what causes the actual regressions? I realize there aren't that many but wonder if anything else is going on that we should investigate.

@EgorBo
Copy link
Member Author

EgorBo commented Jan 10, 2020

Is this representative of what causes the actual regressions? I realize there aren't that many but wonder if anything else is going on that we should investigate.

looks so, here is the biggest "regression" (Microsoft.CodeAnalysis.CSharp.Binder:BindIntegralMinValConstants): https://www.diffchecker.com/5ttTCfO4
perhaps cmp reg, 0 instead of test?

I wonder how many of these come from inlining? The jit inliner currently doesn't consider a constant string passed as an argument to be noteworthy. Perhaps it should (or should, if that arg ends up in a bounds check...).

If I read the diff correctly - none of them were inlined but maybe I am wrong. I understand what you mean but I guess tweaking the jit heuristics (better multiplier for cns strings) is too complicated for this PR 🙂

@AndyAyersMS
Copy link
Member

here is the biggest "regression"

Looks like we are now missing some CSEs and so re-fetching strings from their handles, which is what leads to the size increase. Can you investigate?

If I read the diff correctly - none of them were inlined

You could run diffs with inlining disabled (say via COMPlus_JitNoInline=1) to get at what I'm after. I'm curious how often code fetches the length of a literal string constant locally versus cases that the jit brings together via inlining. If disabling inlining drastically cuts down on diffs then perhaps an inline heuristic tweak is worth looking into.

@EgorBo
Copy link
Member Author

EgorBo commented Jan 13, 2020

@AndyAyersMS jit-diff results with COMPlus_JitNoInline=1:

Total bytes of diff: -804 (-0.00% of base)
38 total methods with size differences (37 improved, 1 regressed), 262228 unchanged.

see https://gist.github.com/EgorBo/4110b359ed109ea4ef198250aee3d3da
The only regression is this method: https://www.diffchecker.com/WyHZkCx8

so it's -30053 (default) diff vs -804 (noinline).

Should I try to check for IsCSECandidate in the optimization? (upd: ah I guess I can't do it in the importer)

monojenkins pushed a commit to monojenkins/runtime that referenced this pull request Jan 23, 2020
```csharp
static int Test() => "Hello".Length; // replace with just `5`
```
Before:
```asm
movabs rax,0x7fd6c1758160
mov    eax,DWORD PTR [rax]
ret
```
Now:
```asm
mov    eax,0x5
ret
```

Also, handles cases after inlining, e.g.:
```csharp
bool Validate(string str)
{
    return str.Length > 0 && str.Length <= 100;
}

bool Test()
{
    return Validate("Hello");  // Validate() will be inlined
}
```
New codegen for `Test`:
```asm
mov    eax, 1     ; return true
ret
```

I have a similar PR for CoreCLR: dotnet#1378
jit-diff report found >30k bytes of improvements.
@jkotas
Copy link
Member

jkotas commented Jan 27, 2020

Did you drop all changes in ThunkInput.txt by accident?

@EgorBo EgorBo force-pushed the opt-string-length branch from 44336c1 to becc851 Compare January 28, 2020 08:05
@EgorBo
Copy link
Member Author

EgorBo commented Jan 28, 2020

@jkotas oops, yes, reverted.

@jkotas
Copy link
Member

jkotas commented Jan 31, 2020

@jkotas can you look at the changes in dynamicmethod.cpp ?

The VM side of the changes LGTM (modulo one nit).

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIT/SPMI side of changes looks fine.

@BruceForstall here's another jit GUID bump. Do you want us to hold off merging this?

@BruceForstall
Copy link
Member

@BruceForstall here's another jit GUID bump. Do you want us to hold off merging this?

Could we hold off a couple weeks? And merge it approximately the same time as #2249? We're just today starting to get some long-awaited (internal only) SuperPMI collections that I'd like to be able to use before requiring a recollection.

@stephentoub
Copy link
Member

@BruceForstall, can you please post back here when we can take this? Not sure how precise a "couple" was in your comments, and it's been two weeks 😄 Thanks!

@BruceForstall
Copy link
Member

We're ready to take JIT-EE interface changes now (some internal processes dependent on the JIT-EE interface GUID are now stable). I'll go ahead and merge this.

@BruceForstall BruceForstall merged commit fa71105 into dotnet:master Feb 14, 2020
@stephentoub
Copy link
Member

Thanks, Bruce. And Egor :)

@EgorBo EgorBo deleted the opt-string-length branch May 25, 2020 11:57
monojenkins pushed a commit to monojenkins/mono that referenced this pull request Oct 29, 2020
A similar optimization for RyuJIT was quite profitable judging by the jit-diff: dotnet/runtime#1378

Example:
```csharp
static int Test()
{
    return "hello".Length;
}
```
Current `MONO_VERBOSE_METHOD=Test` output:
```
IR_0000: ldstr      0
IR_0002: strlen
IR_0003: ret
```
New output:
```
IR_0000: ldc.i4.5
IR_0001: ret
```
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants