Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterating with ForEach over ImmutableArray is slower than over Array #780

Closed
hnrqbaggio opened this issue Dec 11, 2019 · 16 comments · Fixed by #1183
Closed

Iterating with ForEach over ImmutableArray is slower than over Array #780

hnrqbaggio opened this issue Dec 11, 2019 · 16 comments · Fixed by #1183
Labels

Comments

@hnrqbaggio
Copy link
Contributor

This is a spin off issue from https://github.com/dotnet/corefx/issues/36416, to scope down to just the ImmutableArray case.

As mentioned in other issues, the immutability sometimes comes with trade-offs in some operations, but in this case it seems that the extra overhead can be optimized, at least for when it's a collection of value type.

Comparing with Array

Looking at the ASM code generated, the Array version is high inlined, while for the ImmutableArray the JIT is able to inline the loop itself, but not the call to ImmutableArray<T>.GetEnumerator(). The method itself is quite simple, but calls to an internal method called ThrowNullRefIfNotInitialized() to validate that the underlying array is not null.

It seems that the extra method causes the collection to have more branch mis-predictions and cache misses than the array case (the cache misses show up in the results when the collection is larger, say 2048 instead of the default 512 elements).

Possible fix

If we change the implementation of GetEnumerator to use MethodImplOptions.AggressiveInlining, the JIT is able to inline the call and the stats of both benchmarks match and the Median for the Int32 case improves by 4x.

No Slower results for the provided threshold = 3% and noise filter = 0.3ns.
Faster base/diff Base Median (ns) Diff Median (ns) Modality
IterateForEach.ImmutableArray(Size: 512) 4.07 1110.44 273.14 several?

Unfortunately, this seems to not be enough for the Reference Type case. When the benchmark runs using String instead of Int32 the results are still the same as before, so it seems that marking the methods as inline is still not sufficient for the JIT to optimize them.

Correctness

The code in ThrowNullRefIfNotInitialized is just accessing the underlying Array.Length property and relying on that to throw if the object is null.

In the optimized version, I can't see the exact same instructions, so would still need to confirm that the optimizer is not discarding that check. However, the tests that validate that GetEnumerator will throw NRE in that condition did pass.

I have the changes above in my fork of the repo, but this is my first contribution so would like to check the feedback on the findings and not send a PR right away. 😁

Baseline

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-RUPWLA : .NET Core 5.0.0 (CoreCLR 5.0.19.61001, CoreFX 5.0.19.61001), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Type Method Size Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated BranchInstructions/Op CacheMisses/Op BranchMispredictions/Op
IterateForEach<Int32> Array 512 173.4 ns 18.47 ns 21.28 ns 177.2 ns 145.1 ns 199.3 ns - - - - 212 0 1
IterateForEach<String> Array 512 157.5 ns 15.91 ns 18.32 ns 169.4 ns 131.8 ns 179.9 ns - - - - 205 0 1
IterateForEach<Int32> ImmutableArray 512 1,083.9 ns 38.67 ns 44.53 ns 1,110.4 ns 997.1 ns 1,115.4 ns - - - - 415 0 2
IterateForEach<String> ImmutableArray 512 1,101.5 ns 38.80 ns 44.69 ns 1,115.3 ns 1,010.2 ns 1,175.2 ns - - - - 462 0 2

Array

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].Array()
       xor     eax,eax
       mov     rdx,qword ptr [rcx+8]
       xor     ecx,ecx
       mov     r8d,dword ptr [rdx+8]
       test    r8d,r8d
       jle     M00_L01
M00_L00:
       movsxd  rax,ecx
       mov     eax,dword ptr [rdx+rax*4+10h]
       inc     ecx
       cmp     r8d,ecx
       jg      M00_L00
M00_L01:
       ret
       sbb     dword ptr [rax],eax
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-76h],dh
       rcr     dword ptr [rdx-6],cl
       jg      M00_L02
M00_L02:
       add     byte ptr [rbp+48h],dl
       mov     ebp,esp
       mov     qword ptr [rbp+10h],rcx
       mov     rax,qword ptr [rbp+10h]
       mov     rax,qword ptr [rax+58h]
; Total bytes of code 64

ImmutableArray

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].ImmutableArray()
       push    rsi
       sub     rsp,40h
       xor     eax,eax
       mov     qword ptr [rsp+38h],rax
       mov     qword ptr [rsp+28h],rax
       mov     qword ptr [rsp+30h],rax
       xor     esi,esi
       mov     rcx,qword ptr [rcx+0C0h]
       mov     qword ptr [rsp+38h],rcx
       lea     rcx,[rsp+38h]
       lea     rdx,[rsp+28h]
       call    System.Collections.Immutable.ImmutableArray`1[[System.Int32, System.Private.CoreLib]].GetEnumerator()
       jmp     M00_L01
M00_L00:
       cmp     dword ptr [rsp+30h],edx
       jae     M00_L02
       mov     rax,qword ptr [rsp+28h]
       mov     edx,dword ptr [rsp+30h]
       movsxd  rdx,edx
       mov     esi,dword ptr [rax+rdx*4+10h]
M00_L01:
       mov     eax,dword ptr [rsp+30h]
       inc     eax
       mov     dword ptr [rsp+30h],eax
       mov     rdx,qword ptr [rsp+28h]
       mov     edx,dword ptr [rdx+8]
       cmp     edx,eax
       jg      M00_L00
       mov     eax,esi
       add     rsp,40h
       pop     rsi
       ret
M00_L02:
       call    CoreCLR!JIT_RngChkFail
       int     3
       add     byte ptr [rcx],bl
       add     eax,72050002h
       add     dword ptr [rax+40h],esp
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-26h],ah
; Total bytes of code 138
; System.Collections.Immutable.ImmutableArray`1[[System.Int32, System.Private.CoreLib]].GetEnumerator()
       push    rbp
       push    rdi
       push    rsi
       sub     rsp,40h
       vzeroupper
       lea     rbp,[rsp+50h]
       xor     eax,eax
       mov     qword ptr [rbp-18h],rax
       mov     qword ptr [rbp-28h],rax
       mov     qword ptr [rbp+10h],rcx
       mov     qword ptr [rbp+18h],rdx
       mov     rcx,qword ptr [rbp+10h]
       mov     rcx,qword ptr [rcx]
       mov     qword ptr [rbp-18h],rcx
       lea     rcx,[rbp-18h]
       call    System.Collections.Immutable.ImmutableArray`1[[System.Int32, System.Private.CoreLib]].ThrowNullRefIfNotInitialized()
       vxorps  xmm0,xmm0,xmm0
       vmovdqu xmmword ptr [rbp-28h],xmm0
       lea     rcx,[rbp-28h]
       mov     rdx,qword ptr [rbp-18h]
       call    System.Collections.Immutable.ImmutableArray`1+Enumerator[[System.Int32, System.Private.CoreLib]]..ctor(Int32[])
       mov     rdi,qword ptr [rbp+18h]
       lea     rsi,[rbp-28h]
       call    CoreCLR!JIT_ByRefWriteBarrier
       movs    qword ptr [rdi],qword ptr [rsi]
       mov     rax,qword ptr [rbp+18h]
       lea     rsp,[rbp-10h]
       pop     rsi
       pop     rdi
       pop     rbp
       ret
       int     3
       int     3
       sbb     dword ptr [rdi],eax
       add     al,0
       ???
       jb      00007ffa`5ad52c12
       ???
       add     dh,byte ptr [rax+1]
       push    rax
       add     byte ptr [rax],al
       add     al,cl
       fcmovbe st,st(4)
       pop     rdx
       cli
       jg      00007ffa`5ad52c1f
; Total bytes of code 127
; System.Collections.Immutable.ImmutableArray`1[[System.Int32, System.Private.CoreLib]].ThrowNullRefIfNotInitialized()
       push    rbp
       mov     rbp,rsp
       mov     qword ptr [rbp+10h],rcx
       mov     rax,qword ptr [rbp+10h]
       mov     rax,qword ptr [rax]
       mov     eax,dword ptr [rax+8]
       pop     rbp
       ret
       sbb     dword ptr [rcx],eax
       add     dword ptr [rax],eax
       add     dword ptr [rax],edx
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       and     bl,bl
       ???
       pop     rdx
       cli
       jg      M02_L00
M02_L00:
       add     byte ptr [rbp+48h],dl
; Total bytes of code 50

With Aggressive Inlining

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-RUPWLA : .NET Core 5.0.0 (CoreCLR 5.0.19.61001, CoreFX 5.0.19.61001), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Type Method Size Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated BranchInstructions/Op CacheMisses/Op BranchMispredictions/Op
IterateForEach<Int32> Array 512 156.5 ns 6.94 ns 7.71 ns 154.1 ns 148.4 ns 173.3 ns - - - - 223 0 1
IterateForEach<String> Array 512 140.1 ns 4.23 ns 4.87 ns 138.7 ns 131.3 ns 148.8 ns - - - - 226 0 1
IterateForEach<Int32> ImmutableArray 512 271.6 ns 24.17 ns 27.84 ns 273.1 ns 233.7 ns 301.4 ns - - - - 424 0 2
IterateForEach<String> ImmutableArray 512 1,125.9 ns 107.31 ns 123.58 ns 1,071.1 ns 997.4 ns 1,293.6 ns - - - - 434 0 2

Array

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].Array()
       xor     eax,eax
       mov     rdx,qword ptr [rcx+8]
       xor     ecx,ecx
       mov     r8d,dword ptr [rdx+8]
       test    r8d,r8d
       jle     M00_L01
M00_L00:
       movsxd  rax,ecx
       mov     eax,dword ptr [rdx+rax*4+10h]
       inc     ecx
       cmp     r8d,ecx
       jg      M00_L00
M00_L01:
       ret
       sbb     dword ptr [rax],eax
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-76h],dh
       fistp   word ptr [rcx-6]
       jg      M00_L02
M00_L02:
       add     byte ptr [rbp+48h],dl
       mov     ebp,esp
       mov     qword ptr [rbp+10h],rcx
       mov     rax,qword ptr [rbp+10h]
       mov     rax,qword ptr [rax+58h]
; Total bytes of code 64

ImmutableArray

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].ImmutableArray()
       sub     rsp,28h
       xor     eax,eax
       mov     rdx,qword ptr [rcx+0C0h]
       mov     ecx,dword ptr [rdx+8]
       mov     r8d,0FFFFFFFFh
       jmp     M00_L01
M00_L00:
       cmp     r8d,ecx
       jae     M00_L02
       movsxd  rax,r8d
       mov     eax,dword ptr [rdx+rax*4+10h]
M00_L01:
       inc     r8d
       cmp     ecx,r8d
       jg      M00_L00
       add     rsp,28h
       ret
M00_L02:
       call    CoreCLR!JIT_RngChkFail
       int     3
       add     byte ptr [rcx],bl
       add     al,1
       add     byte ptr [rdx+rax*2],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-26h],dl
       loopne  00007ffa`59e12bb5
       cli
       jg      M00_L03
M00_L03:
       add     byte ptr [rbp+48h],dl
; Total bytes of code 82
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Dec 11, 2019
@hnrqbaggio
Copy link
Contributor Author

hnrqbaggio commented Dec 11, 2019

/cc @adamsitnik who opened the original issue about all Immutable collections.

@adamsitnik
Copy link
Member

Inlining GetEnumerator looks good to me. The code size is going to grow, but I believe that x4 speed improvement is worth it.

I am suprised that ThrowNullRefIfNotInitialized is not getting inlined by default. Are you sure about this? To verify that you can use --profiler ETW and open the trace file with PerfView and go to events tab. There should be events for inlining succeeded and failed. You might find this blog post useful.

Where most of the time is spent for the string case? You can use one of the recommended profilers to find out: https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-corefx.md

Also, I wonder what perf we would get if GetEnumerator was implemented as return this.array.GetEnumerator()?

@adamsitnik adamsitnik added the tenet-performance Performance related issue label Dec 13, 2019
@hnrqbaggio
Copy link
Contributor Author

Thanks for the pointer @adamsitnik. I didn't know I could use PerfView for the Inline Events and was having trouble with the InliningDiagnoser because it emits the output to the screen it was too much info.

Looking now, it seems that the JIT is successfully inlining the ctor and ThrowNullRefIfNotInitialized inside GetEnumerator but then on the caller it says that GetEnumerator is not profitable:

ThreadID="29,572" ProcessorNumber="2" MethodBeingCompiledNamespace="System.Collections.IterateForEach_1[System.Int32]" MethodBeingCompiledName="ImmutableArray" MethodBeingCompiledNameSignature="instance !0 ()" InlinerNamespace="System.Collections.IterateForEach_1[System.Int32]" InlinerName="ImmutableArray" InlinerNameSignature="instance !0 ()" InlineeNamespace="System.Collections.Immutable.ImmutableArray_1[System.Int32]" InlineeName="GetEnumerator" InlineeNameSignature="instance value class Enumerator<!0> ()" FailAlways="False" FailReason="unprofitable inline" ClrInstanceID="7"

I've tried to do some research on what that reason is, but could not find much data. Maybe the JIT is considering that MoveNext and GetCurrent are more important to inline than the single call to GetEnumerator?

About the String case

I've tried looking at the ETL traces when I started this investigation, but for some reason I'm not able to see anything inside the benchmark method itself, because it's grayed out in PerfView and can't expand further.

This is what I get in the call stacks. Even if I load all symbols for the modules that are with ? in there the leaf node doesn't change. I've tried to follow the posts on your blog and the tutorial for PerfView but wasn't able to fix the problem (even in WPA I see the same pattern, I'll see if I can use Visual Studio and get something).
perfview

The JIT Events for that case show the same pattern than for Int32: the sub-methods seem to be successfully selected for inlining, but at GetEnumerator the JIT decides that's not worth it.

Just use this.array.GetEnumerator

Finally, at some point I did test just forwarding the call to this.array.GetEnumerator, but that didn't seem to be enough to optimize the flow. It also changes the exceptions that the methods can return (because the are some checks in MoveNext and GetCurrent that would not apply anymore) so it would require adjusting some unit tests and maybe a review if that's a breaking change or not.

But I still might give that another try at least to use as a base to compare the JIT behavior.

@hnrqbaggio
Copy link
Contributor Author

hnrqbaggio commented Dec 20, 2019

Got some interesting information after the suggestions.

Why inlining GetEnumerator helps

I was curious why a method that is called only once and it's not that heavy would make such a difference in the results. In the VS Profiler, it shows that most of the time is spent on the assignment inside the loop, and the assembly shows the difference in terms of the instructions executed.

The assembly snips I've posted earlier show that the loop for Array or for the Inlined case is very small, and works on registers only for both the array address in RDX and the loop index variable in RAX.

The loop in the baseline shows that the address of the array and the loop variable are loaded from the stack and stored back on every iteration. It seems that the overhead of these extra memory access compounds during the loop, which is reasonable even if if using the L1 cache (it also might explain why that case has more cache misses).

I'm not sure if this is expected or not, but it seems that the non-inlined function blocks the JIT from seeing that the array can be accessed directly from the registers from earlier in the code and optimize the calls. Once we remove that "barrier" the code gets optimized.

Code size

It also seems that inlining also helps with the code size, and the reported number of bites for the benchmark is smaller, because the extra memory operations mentioned above get optimized away.

Benchmark for String

After testing a few variants of the benchmark to understand the differences in the code, I've noticed that the JIT is able to inline GetEnumerable if the benchmark is not inside a generic class itself. By default, the code in the Performance repo uses a generic class with GenericTypeArgumentsAttribute to test both Int32 and String.

In PerfView, the JIT Inlining events report a different reason for not optimizing the String case when my fix is on: before it was due to an "Unprofitable Inlining" and now it reports a "Runtime Dictionary Lookup".

Below is the result a similar benchmark that doesn't use a generic class and just the target type directly. The times drop to the same range as the Int32 case. So it seems that the proposed fix would benefit both cases.

@adamsitnik is this something that you've encountered before when working with the benchmarks?

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-RUPWLA : .NET Core 5.0.0 (CoreCLR 5.0.19.61901, CoreFX 5.0.19.61901), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Type Method Size Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated BranchInstructions/Op BranchMispredictions/Op CacheMisses/Op
IterateForEach<Int32> Array 512 369.9 ns 15.74 ns 16.84 ns 364.3 ns 351.5 ns 404.8 ns - - - - 519 1 0
IterateForEach<String> Array 512 305.1 ns 10.97 ns 12.63 ns 304.9 ns 288.7 ns 334.0 ns - - - - 518 1 0
IterateForEach_Int32 Array 512 288.5 ns 25.03 ns 28.83 ns 272.9 ns 254.4 ns 331.7 ns - - - - 517 1 0
IterateForEach_String Array 512 317.7 ns 25.76 ns 29.67 ns 321.8 ns 251.4 ns 371.4 ns - - - - 517 1 0
IterateForEach<Int32> ImmutableArray 512 396.7 ns 9.58 ns 10.25 ns 395.1 ns 378.2 ns 419.2 ns - - - - 1,031 1 0
IterateForEach<String> ImmutableArray 512 1,531.5 ns 91.32 ns 93.78 ns 1,514.5 ns 1,439.6 ns 1,823.3 ns - - - - 1,059 2 1
IterateForEach_Int32 ImmutableArray 512 314.8 ns 28.15 ns 32.42 ns 303.5 ns 270.7 ns 359.0 ns - - - - 1,030 1 0
IterateForEach_String ImmutableArray 512 421.9 ns 8.41 ns 9.35 ns 423.7 ns 408.8 ns 437.3 ns - - - - 1,031 1 0
; System.Collections.IterateForEach_String.ImmutableArray()
       sub     rsp,28h
       xor     eax,eax
       mov     rdx,qword ptr [rcx+0C0h]
       mov     ecx,dword ptr [rdx+8]
       mov     r8d,0FFFFFFFFh
       jmp     M00_L01
M00_L00:
       cmp     r8d,ecx
       jae     M00_L02
       movsxd  rax,r8d
       mov     rax,qword ptr [rdx+rax*8+10h]
M00_L01:
       inc     r8d
       cmp     ecx,r8d
       jg      M00_L00
       add     rsp,28h
       ret
M00_L02:
       call    CoreCLR!JIT_RngChkFail
       int     3
       sbb     dword ptr [rcx+rax],eax
       add     byte ptr [rdx+rax*2],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-0Ah],bl
       and     dh,byte ptr [rdi]
       ???
       jg      M00_L03
M00_L03:
       add     byte ptr [rbp+48h],dl
; Total bytes of code 82

@danmoseley
Copy link
Member

cc @AndyAyersMS for observations about inlining above

@AndyAyersMS
Copy link
Member

@hnrqbaggio: thanks for the observations!

For the int case, the GetEnumerator method just misses being inlined by default:

multiplier in methods of promotable struct increased to 3.
Inline candidate callsite is boring.  Multiplier increased to 4.3.
calleeNativeSizeEstimate=461
callsiteNativeSizeEstimate=85
benefit multiplier=4.3
threshold=365
Native estimate for function size exceeds threshold for inlining 46.1 > 36.5 (multiplier = 4.3)
INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline' for 'X`1[Int32][System.Int32]:IA():int:this' calling 'System.Collections.Immutable.ImmutableArray`1[Int32][System.Int32]:GetEnumerator():Enumerator[Int32]:this'

We might consider boosting the "promotable struct" multiplier somewhat to give this sort of inline an extra nudge in the jit, since we get a lot of benefit out of promotion. I'll put this on my todo list.

For the string case: the jit is unable to inline a method from one shared generic class into a method from another shared generic class. So when benchmarking a shared generic method, the calling context matters quite a bit.

We can sometimes work around this if the method being inlined doesn't actually use the results of the runtime lookup. Let me dig in and see if that's the case here.

I am also looking into the inlined int case, seems like we ought to be able to match the array version codegen but can't remove the bounds check.

@AndyAyersMS
Copy link
Member

In the inlined int case, the jit can't perform what we call a "do-while" transformation on the loop, because the loop exit block has multiple statements. Basically the code (after inlining) resembles this:

    static int F(int[] a)
    {
        int r = 0;
        for (int i = -1; ++i < a.Length; )
        {
            r = a[i];
        }
        return r;
    }

And without this transformation the jit won't optimize out the bounds check.

This is limitation is going to hold for any sort of inlined enumerator because MoveNext must do two things: update some internal state and perform a test. We ought to look at relaxing this constraint and be willing to let the jit duplicate more statements if the state update and test are sufficiently cheap.

@AndyAyersMS
Copy link
Member

As for the "inlined" string case -- the inline currently fails when inlining GetEnumerator once the importer reaches the call to ThrowNullRefIfNotInitialized, because that method's signature indicates that it requires an generic context parameter, and that parameter requires a runtime lookup, and there's currently no way to do runtime lookups safely, unless we're in the root method.

It turns out that ThrowNullRefIfNotInitialized does not use its generic context, but the jit doesn't have any way of figuring that out before it has to decide whether or not to allow GetEnumerator to inline. And even if we could get past that, we'd hit the same issue on the Enumerator constructor call.

@AndyAyersMS
Copy link
Member

So, areas for codegen follow-up:

  • look at boosting inline mulitplier for promotable structs
  • look at allowing multiple statements in the do-while transformation

@danmoseley
Copy link
Member

Is the upshot that until those changes, it is worth force inlining it?

@AndyAyersMS
Copy link
Member

Yes, I think adding forceinline here is reasonable. Seems like whoever wrote the code was expecting inlining to happen...

/// It is important that this enumerator does NOT implement <see cref="IDisposable"/>.
/// We want the iterator to inline when we do foreach and to not result in
/// a try/finally frame in the client.
/// </remarks>
public struct Enumerator
{

@danmoseley
Copy link
Member

@hnrqbaggi do you want to offer a PR?

@hnrqbaggio
Copy link
Contributor Author

Yes, I should have one ready soon.
Thanks for the follow-up!

@AndyAyersMS
Copy link
Member

As a follow-up: the promotable struct benefit in the inliner needs to be at least 5.5 (currently 3) for this case to be handled by default. Not surprisingly this has fairly widespread impact.

It's hard to be surgical when changing the inlining heuristics. Lots of good diffs, lots of bad diffs.

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: 238198 (0.58% of base)
    diff is a regression.
Top file regressions (bytes):
       41733 : System.Private.CoreLib.dasm (0.91% of base)
       39944 : System.Private.Xml.dasm (1.12% of base)
       18091 : System.Collections.Immutable.dasm (1.65% of base)
       17741 : System.Net.Http.dasm (2.88% of base)
       17337 : System.Data.Common.dasm (1.16% of base)
       15682 : Microsoft.CodeAnalysis.dasm (0.89% of base)
       12618 : Newtonsoft.Json.dasm (1.46% of base)
       12616 : NuGet.Protocol.Core.v3.dasm (4.67% of base)
        4562 : System.Drawing.Primitives.dasm (11.94% of base)
        4110 : System.Private.DataContractSerialization.dasm (0.54% of base)
        3639 : System.Threading.Tasks.Dataflow.dasm (0.44% of base)
        2928 : System.ComponentModel.TypeConverter.dasm (1.09% of base)
        2812 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.05% of base)
        2516 : System.Security.Cryptography.X509Certificates.dasm (1.50% of base)
        2377 : System.Reflection.Metadata.dasm (0.56% of base)
        2135 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.07% of base)
        2128 : System.IO.Compression.dasm (2.80% of base)
        2105 : System.IO.Pipes.dasm (5.39% of base)
        2031 : System.Runtime.Extensions.dasm (2.72% of base)
        1934 : System.Private.Xml.Linq.dasm (1.28% of base)
Top file improvements (bytes):
        -694 : Microsoft.CodeAnalysis.CSharp.dasm (-0.02% of base)
         -38 : NuGet.Configuration.dasm (-0.07% of base)
         -16 : CommandLine.dasm (-0.00% of base)
          -7 : System.Text.RegularExpressions.dasm (-0.00% of base)
          -5 : System.Linq.Parallel.dasm (-0.00% of base)
76 total files with Code Size differences (5 improved, 71 regressed), 53 unchanged.
Top method regressions (bytes):
        2739 (17.08% of base) : System.Collections.Immutable.dasm - Enumerator:MoveNext():bool:this (77 methods)
        2502 (23.60% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:DecodeModifiers(SyntaxTokenList,int,int,int,DiagnosticBag):MemberModifiers:this
        2440 (12.75% of base) : NuGet.Protocol.Core.v3.dasm - <TryCreate>d__1:MoveNext():this (20 methods)
        2113 (126.60% of base) : System.Private.CoreLib.dasm - ConfiguredValueTaskAwaiter:System.Runtime.CompilerServices.IStateMachineBoxAwareAwaiter.AwaitUnsafeOnCompleted(IAsyncStateMachineBox):this (8 methods)
        1852 (134.11% of base) : System.Private.CoreLib.dasm - ValueTaskAwaiter`1:System.Runtime.CompilerServices.IStateMachineBoxAwareAwaiter.AwaitUnsafeOnCompleted(IAsyncStateMachineBox):this (7 methods)
        1723 (99.42% of base) : Microsoft.CodeAnalysis.dasm - SyntaxDiffer:GetSimilarity(SyntaxNodeOrToken,SyntaxNodeOrToken):int:this
        1610 (12.77% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodToClassRewriter`1:RewriteBlock(BoundBlock,ArrayBuilder`1,ArrayBuilder`1):BoundBlock:this (7 methods)
        1610 (33.67% of base) : System.Collections.Immutable.dasm - <get_Keys>d__25:MoveNext():bool:this (7 methods)
        1610 (33.77% of base) : System.Collections.Immutable.dasm - <get_Values>d__27:MoveNext():bool:this (7 methods)
        1594 (25.40% of base) : System.Net.Http.dasm - <WaitWithCancellationAsync>d__3:MoveNext():this (7 methods)
        1584 (27.86% of base) : System.Collections.Immutable.dasm - Enumerator:Reset():this (56 methods)
        1338 (46.41% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Parser:ParseIfExpression():ExpressionSyntax:this
        1219 (91.93% of base) : Microsoft.CodeAnalysis.dasm - SyntaxDiffer:FindBestMatch(Stack`1,SyntaxNodeOrToken,byref,byref,int):this
        1211 (14.63% of base) : NuGet.Protocol.Core.v3.dasm - <ProcessStreamAsync>d__25`1:MoveNext():this (7 methods)
        1200 (17.50% of base) : System.Private.Xml.dasm - <WriteEndAttributeAsync_SepcialAtt>d__134:MoveNext():this
        1156 ( 8.81% of base) : NuGet.Protocol.Core.v3.dasm - <StartWithTimeout>d__0`1:MoveNext():this (7 methods)
        1148 (180.22% of base) : System.IO.Pipes.dasm - PipeCompletionSource`1:ReleaseResources():this (7 methods)
        1045 (45.51% of base) : System.Collections.Immutable.dasm - <get_Values>d__26:MoveNext():bool:this (7 methods)
         945 (37.97% of base) : System.ComponentModel.TypeConverter.dasm - ColorConverter:ConvertTo(ITypeDescriptorContext,CultureInfo,Object,Type):Object:this
         940 (116.92% of base) : System.Threading.Tasks.Dataflow.dasm - <>c:<.cctor>b__16_0(Task`1,Object):bool:this (7 methods)
Top method improvements (bytes):
        -747 (-4.64% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(VisualBasicSyntaxNode,Location,int,byref,ImmutableArray`1,ImmutableArray`1,bool,bool,bool,bool,DiagnosticBag,Symbol,bool,VisualBasicSyntaxNode,Symbol):this
        -570 (-9.35% of base) : System.Security.Cryptography.X509Certificates.dasm - AsnWriter:WriteUtcTimeCore(Asn1Tag,DateTimeOffset):this
        -406 (-24.37% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineRewriter`1:EnsureAllSymbolsAndSignature():bool:this (7 methods)
        -387 (-18.90% of base) : Microsoft.CodeAnalysis.dasm - ImmutableArrayExtensions:HasDuplicates(ImmutableArray`1,IEqualityComparer`1):bool (7 methods)
        -384 (-11.94% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:ExecuteDeclaringReferenceActions(SymbolDeclaredCompilationEvent,AnalysisScope,AnalysisState,CancellationToken):this (6 methods)
        -372 (-26.38% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:ShouldExecuteCodeBlockActions(AnalysisScope,ISymbol):bool:this (6 methods)
        -350 (-11.61% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineMethodToClassRewriter:PossibleStateMachineScope(ImmutableArray`1,BoundNode):BoundNode:this (7 methods)
        -336 (-23.81% of base) : Microsoft.CodeAnalysis.dasm - UnionCollection`1:CopyTo(ref,int):this (7 methods)
        -324 (-32.53% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:ShouldExecuteSyntaxNodeActions(AnalysisScope):bool:this (6 methods)
        -318 (-34.53% of base) : System.Private.CoreLib.dasm - ArraySegment`1:GetEnumerator():Enumerator:this (7 methods)
        -294 (-37.31% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - ControlFlowPass:VisitFinallyBlock(BoundStatement,byref):this
        -293 (-21.85% of base) : Microsoft.CodeAnalysis.dasm - Hash:CombineValues(ImmutableArray`1,int):int (7 methods)
        -288 (-19.38% of base) : System.Collections.Immutable.dasm - ImmutableArrayExtensions:ToDictionary(ImmutableArray`1,Func`2,IEqualityComparer`1):Dictionary`2 (7 methods)
        -269 (-3.95% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineRewriter`1:GenerateKickoffMethodBody():BoundBlock:this (7 methods)
        -268 (-7.76% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VBSemanticModel:GetSemanticSymbols(BoundNodeSummary,Binder,int,byref,byref):ImmutableArray`1:this
        -260 (-42.69% of base) : Microsoft.CodeAnalysis.dasm - MetadataVisitor:Visit(ImmutableArray`1):this (5 methods)
        -252 (-2.09% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:ExecuteDeclaringReferenceActions(SyntaxReference,SymbolDeclaredCompilationEvent,AnalysisScope,AnalysisState,bool,bool,CancellationToken):this (6 methods)
        -245 (-9.66% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LambdaRewriter:IntroduceFrame(BoundNode,LambdaFrame,Func`3,LambdaSymbol):BoundNode:this
        -243 (-15.99% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerExecutor:ExecuteSyntaxNodeActions(SyntaxNode,IDictionary`2,SemanticModel,Func`2,Action`1,SyntaxNodeAnalyzerStateData):this (6 methods)
        -228 (-8.83% of base) : Microsoft.CodeAnalysis.dasm - ImmutableArrayExtensions:Distinct(ImmutableArray`1,IEqualityComparer`1):ImmutableArray`1 (7 methods)
Top method regressions (percentages):
         330 (6,600.00% of base) : Microsoft.CodeAnalysis.dasm - SyntaxNodeOrToken:WithLeadingTrivia(ref):SyntaxNodeOrToken:this
         330 (6,600.00% of base) : Microsoft.CodeAnalysis.dasm - SyntaxNodeOrToken:WithTrailingTrivia(ref):SyntaxNodeOrToken:this
         298 (5,960.00% of base) : System.Private.CoreLib.dasm - RuntimeMethodInfo:GetGenericArgumentsInternal():ref:this
         204 (1,854.55% of base) : System.Net.Http.dasm - SafeDeleteContext:ToString():String:this
         204 (1,854.55% of base) : System.Net.HttpListener.dasm - SafeDeleteContext:ToString():String:this
         204 (1,854.55% of base) : System.Net.Mail.dasm - SafeDeleteContext:ToString():String:this
         204 (1,854.55% of base) : System.Net.Security.dasm - SafeDeleteContext:ToString():String:this
         803 (1,825.00% of base) : System.Private.CoreLib.dasm - ValueTuple`2:System.Collections.IStructuralEquatable.GetHashCode(IEqualityComparer):int:this (7 methods)
         803 (1,825.00% of base) : System.Private.CoreLib.dasm - ValueTuple`2:System.IValueTupleInternal.GetHashCode(IEqualityComparer):int:this (7 methods)
          81 (1,620.00% of base) : Microsoft.CodeAnalysis.dasm - Blobs:MoveNext():bool:this
          81 (1,620.00% of base) : System.Reflection.Metadata.dasm - Blobs:MoveNext():bool:this
         533 (1,066.00% of base) : Microsoft.CodeAnalysis.dasm - TwoEnumeratorListStack:TryGetNextInSpan(byref,byref):bool:this
         117 (1,063.64% of base) : System.Private.CoreLib.dasm - ModuleHandle:ResolveMethodHandleInternal(RuntimeModule,int):IRuntimeMethodInfo
         533 (1,045.10% of base) : Microsoft.CodeAnalysis.dasm - ThreeEnumeratorListStack:TryGetNextInSpan(byref,byref):bool:this
         140 (933.33% of base) : Microsoft.CodeAnalysis.dasm - MetadataReferenceProperties:op_Equality(MetadataReferenceProperties,MetadataReferenceProperties):bool
         167 (927.78% of base) : System.Collections.Immutable.dasm - ImmutableArray`1:Replace(long,long):ImmutableArray`1:this
         292 (912.50% of base) : System.Private.CoreLib.dasm - RuntimeMethodInfo:GetGenericArguments():ref:this
         100 (909.09% of base) : System.Private.CoreLib.dasm - EventRegistrationTokenListWithCount:Pop(byref):bool:this
         104 (866.67% of base) : System.Data.Common.dasm - SqlSingle:.ctor(double):this
          40 (800.00% of base) : System.Private.CoreLib.dasm - RuntimeType:IsPrimitiveImpl():bool:this
Top method improvements (percentages):
         -86 (-66.67% of base) : System.IO.FileSystem.dasm - DisableMediaInsertionPrompt:Create():DisableMediaInsertionPrompt (3 base, 1 diff methods)
         -65 (-50.78% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - BoundInterpolatedStringExpression:get_HasInterpolations():bool:this
         -65 (-50.78% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MergedTypeDeclaration:get_AnyMemberHasAttributes():bool:this
         -16 (-50.00% of base) : System.Private.CoreLib.dasm - RuntimeTypeHandle:get_Value():long:this (2 base, 1 diff methods)
         -40 (-50.00% of base) : System.Private.CoreLib.dasm - GCHandle:Free():this (2 base, 1 diff methods)
         -65 (-49.62% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MergedTypeDeclaration:get_ContainsExtensionMethods():bool:this
         -65 (-49.62% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MergedTypeDeclaration:get_AnyMemberHasAttributes():bool:this
        -210 (-47.95% of base) : System.Private.CoreLib.dasm - ModuleHandle:GetModuleType(RuntimeModule):RuntimeType (2 base, 1 diff methods)
         -74 (-47.13% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - ConstructedType:GetHashCode():int:this
        -110 (-47.01% of base) : Microsoft.CodeAnalysis.CSharp.dasm - CodeGenerator:EmitSwitchSection(BoundSwitchSection):this
         -60 (-44.78% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - CompilationMergedNamespaceSymbol:BuildExtensionMethodsMap(Dictionary`2):this
         -52 (-44.44% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DataFlowPass:DeclareVariables(ImmutableArray`1):this
         -52 (-44.44% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:CheckInterfaceMembers(ImmutableArray`1,DiagnosticBag)
         -59 (-44.36% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - AsyncMethodToClassRewriter:NeedsSpill(ImmutableArray`1):bool
         -57 (-44.19% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - EmbeddedSymbolManager:ValidateMethod(MethodSymbol)
         -49 (-43.75% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceNamedTypeSymbol:GetConstraintKind(ImmutableArray`1):int
         -52 (-43.33% of base) : Microsoft.CodeAnalysis.CSharp.dasm - AbstractRegionDataFlowPass:MakeSlots(ImmutableArray`1):this
         -52 (-43.33% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - AbstractRegionDataFlowPass:MakeSlots(ImmutableArray`1):this
        -260 (-42.69% of base) : Microsoft.CodeAnalysis.dasm - MetadataVisitor:Visit(ImmutableArray`1):this (5 methods)
         -52 (-42.28% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DataFlowPass:ReportUnusedVariables(ImmutableArray`1):this
3782 total methods with Code Size differences (1148 improved, 2634 regressed), 205151 unchanged.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Dec 28, 2019
@AndyAyersMS
Copy link
Member

Second follow-up: Have a prototype master...AndyAyersMS:FgOptWhileLoop that fgOptWhileLoop to handle multiple statements.

This plus the change from #1183 gives identical codegen for the int array and immutable array cases.

Overall diffs look plausible too; still chasing through the regressions to see what's up, but so far:

  • tree size costing looks iffy in some cases( eg Node:BalanceMany():Node:this)
  • we lose some CSEs, if the code we're duplicating has a CSE candidate (eg Node:get_Max():long:this)
  • old code was not costing the JTRUE node, just the compare underneath, so we may lose a few cases from slightly higher costs
Total bytes of diff: 1150 (0.00% of base)
    diff is a regression.
Top file regressions (bytes):
        7716 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.25% of base)
        2849 : System.Private.Xml.dasm (0.08% of base)
        2060 : System.Data.Common.dasm (0.14% of base)
        1262 : System.Private.CoreLib.dasm (0.03% of base)
         923 : System.Linq.Parallel.dasm (0.06% of base)
         700 : Microsoft.CSharp.dasm (0.23% of base)
         551 : System.Collections.Immutable.dasm (0.05% of base)
         494 : System.Linq.dasm (0.05% of base)
         388 : System.Memory.dasm (0.16% of base)
         357 : System.Private.DataContractSerialization.dasm (0.05% of base)
         352 : System.Text.RegularExpressions.dasm (0.14% of base)
         337 : System.Diagnostics.TraceSource.dasm (0.80% of base)
         290 : System.Security.Cryptography.Algorithms.dasm (0.10% of base)
         178 : Newtonsoft.Json.dasm (0.02% of base)
         155 : System.IO.Compression.dasm (0.20% of base)
         153 : System.Security.Cryptography.Cng.dasm (0.10% of base)
         150 : System.Linq.Expressions.dasm (0.02% of base)
         120 : System.Runtime.Numerics.dasm (0.17% of base)
         109 : System.Security.Claims.dasm (0.47% of base)
         104 : System.Net.Http.dasm (0.02% of base)
Top file improvements (bytes):
      -10344 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.19% of base)
       -7431 : Microsoft.CodeAnalysis.CSharp.dasm (-0.17% of base)
       -1501 : Microsoft.CodeAnalysis.dasm (-0.09% of base)
          -7 : System.Threading.Tasks.Dataflow.dasm (-0.00% of base)
53 total files with Code Size differences (4 improved, 49 regressed), 76 unchanged.
Top method regressions (bytes):
         526 ( 1.26% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:ToDictionary(Func`2,IEqualityComparer`1):Dictionary`2:this (49 methods)
         516 (12.29% of base) : System.Data.Common.dasm - DataTable:SerializeTableSchema(SerializationInfo,StreamingContext,bool):this
         315 (18.67% of base) : System.Collections.Immutable.dasm - Node:BalanceMany():Node:this (7 methods)
         302 ( 3.04% of base) : System.Linq.Parallel.dasm - TakeOrSkipQueryOperatorEnumerator`1:MoveNext(byref,byref):bool:this (7 methods)
         223 (12.57% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:RemoveDuplicates():this (7 methods)
         155 ( 6.18% of base) : System.Linq.Parallel.dasm - OrderedPipeliningMergeEnumerator:MoveNext():bool:this (7 methods)
         139 ( 5.65% of base) : System.Security.Cryptography.Algorithms.dasm - AsnReader:CopyConstructedOctetString(ReadOnlyMemory`1,Span`1,bool,bool,byref):int:this
         139 ( 5.65% of base) : System.Security.Cryptography.Cng.dasm - AsnReader:CopyConstructedOctetString(ReadOnlyMemory`1,Span`1,bool,bool,byref):int:this
         133 ( 1.88% of base) : System.Linq.Parallel.dasm - AsynchronousChannelMergeEnumerator`1:MoveNextSlowPath():bool:this (7 methods)
         126 (13.18% of base) : System.Linq.Parallel.dasm - <System-Collections-Generic-IEnumerable<T>-GetEnumerator>d__21:MoveNext():bool:this (7 methods)
         119 (32.69% of base) : System.Private.CoreLib.dasm - TaskCompletionSource`1:SpinUntilCompleted():this (7 methods)
         116 ( 0.96% of base) : System.Linq.Parallel.dasm - TakeOrSkipWhileQueryOperatorEnumerator`1:MoveNext(byref,byref):bool:this (7 methods)
         107 ( 4.10% of base) : System.Security.Cryptography.Algorithms.dasm - AsnReader:ProcessConstructedBitString(ReadOnlyMemory`1,Span`1,BitStringCopyAction,bool,byref,byref):int:this
         106 ( 0.65% of base) : System.Private.Xml.dasm - XmlReflectionImporter:ImportAccessorMapping(MemberMapping,FieldModel,XmlAttributes,String,Type,bool,bool,RecursionLimiter):this
         105 ( 1.32% of base) : System.Private.Xml.dasm - SchemaCollectionPreprocessor:Preprocess(XmlSchema,String,int):this
         100 (11.43% of base) : System.Private.Xml.dasm - NamespaceEnumerator:MoveNext():bool:this (7 methods)
          98 ( 5.65% of base) : System.Private.Xml.dasm - <GetActiveRecords>d__34:MoveNext():bool:this (7 methods)
          92 ( 8.25% of base) : System.IO.Compression.dasm - <WriteAsync>d__8:MoveNext():this
          91 ( 3.66% of base) : System.Linq.Parallel.dasm - <>c__DisplayClass1_0`2:<SpoolPipeline>b__0():this (7 methods)
          90 ( 3.89% of base) : System.Linq.dasm - SparseArrayBuilder`1:CopyTo(ref,int,int):this (7 methods)
Top method improvements (bytes):
        -868 (-6.11% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodToClassRewriter`1:RewriteBlock(BoundBlock,ArrayBuilder`1,ArrayBuilder`1):BoundBlock:this (7 methods)
        -370 (-10.45% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:ComputeInterfaceImplementations(DiagnosticBag,CancellationToken):ImmutableArray`1:this
        -298 (-12.27% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceNamedTypeSymbol:MakeDeclaredBases(ConsList`1,DiagnosticBag):Tuple`2:this
        -261 (-10.26% of base) : Microsoft.CodeAnalysis.CSharp.dasm - ConversionsBase:ComputeApplicableUserDefinedImplicitConversionSet(BoundExpression,TypeSymbol,TypeSymbol,ArrayBuilder`1,ArrayBuilder`1,byref,bool):this
        -246 (-12.31% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceMemberContainerTypeSymbol:ProcessPartialMethodsIfAny(Dictionary`2,DiagnosticBag):this
        -238 (-9.87% of base) : Microsoft.CodeAnalysis.CSharp.dasm - ConversionsBase:AddUserDefinedConversionsToExplicitCandidateSet(BoundExpression,TypeSymbol,TypeSymbol,ArrayBuilder`1,NamedTypeSymbol,String,byref):this
        -229 (-12.51% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MetadataDecoder:SubstituteNoPiaLocalType(byref,bool,TypeSymbol,String,String,String,AssemblySymbol):NamedTypeSymbol
        -228 (-13.14% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MetadataDecoder:SubstituteNoPiaLocalType(byref,bool,TypeSymbol,String,String,String,AssemblySymbol):NamedTypeSymbol
        -203 (-15.04% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VisualBasicCompilation:GetRuntimeMember(NamedTypeSymbol,byref,SignatureComparer`5,AssemblySymbol):Symbol
        -200 (-3.06% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineRewriter`1:GenerateKickoffMethodBody():BoundBlock:this (7 methods)
        -197 (-16.95% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MemberLookup:LookupInModules(LookupResult,NamespaceSymbol,String,int,int,Binder,byref)
        -182 (-13.79% of base) : Microsoft.CodeAnalysis.CSharp.dasm - OverriddenOrHiddenMembersHelpers:FindOverriddenOrHiddenMembersInType(Symbol,bool,NamedTypeSymbol,NamedTypeSymbol,byref,byref,byref)
        -176 (-13.75% of base) : Microsoft.CodeAnalysis.CSharp.dasm - CSharpCompilation:GetRuntimeMember(NamedTypeSymbol,byref,SignatureComparer`5,AssemblySymbol):Symbol
        -173 (-3.43% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceNamedTypeSymbol:CheckDeclarationNameAndTypeParameters(VisualBasicSyntaxNode,Binder,DiagnosticBag,byref):this
        -157 (-7.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LocalRewriter:RewriteWithBlockStatements(BoundBlock,VisualBasicSyntaxNode,bool,ImmutableArray`1,ImmutableArray`1,BoundValuePlaceholderBase,BoundExpression):BoundBlock:this
        -156 (-12.61% of base) : Microsoft.CodeAnalysis.dasm - Parser:GetMatchingMethods(String,byref,List`1,String,int,Compilation,List`1)
        -155 (-4.34% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - InitializerRewriter:BuildConstructorBody(TypeCompilationState,MethodSymbol,BoundStatement,ProcessedFieldOrPropertyInitializers,BoundBlock):BoundBlock
        -154 (-2.17% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodToClassRewriter`1:RewriteSequence(BoundSequence,ArrayBuilder`1,ArrayBuilder`1):BoundSequence:this (7 methods)
        -151 (-18.92% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Imports:LookupExtensionMethodsInUsings(ArrayBuilder`1,String,int,int,Binder):this
        -147 (-7.55% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:CheckMemberNameConflicts(DiagnosticBag):this
Top method regressions (percentages):
           8 (40.00% of base) : System.Runtime.Numerics.dasm - Number:wcslen(long):int
          30 (35.71% of base) : System.Private.Xml.dasm - ParticleContentValidator:CheckUniqueParticleAttribution(BitSet,ref):this
         119 (32.69% of base) : System.Private.CoreLib.dasm - TaskCompletionSource`1:SpinUntilCompleted():this (7 methods)
           8 (32.00% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEvent:SkipUnicodeString(int):int:this
           7 (31.82% of base) : Microsoft.CodeAnalysis.dasm - ComMemoryStream:ZeroMemory(long,int)
           7 (31.82% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEvent:SkipUTF8String(int):int:this
          10 (30.30% of base) : System.Collections.Immutable.dasm - Node:get_Max():int:this
          10 (30.30% of base) : System.Collections.Immutable.dasm - Node:get_Min():int:this
          22 (30.14% of base) : System.Data.Common.dasm - Merger:MergeConstraints(DataSet):this
          22 (29.73% of base) : System.Data.Common.dasm - DSRowDiffIdUsageSection:Prepare(DataSet):this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Max():__Canon:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Min():__Canon:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Max():ubyte:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Min():ubyte:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Max():long:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Min():long:this
          16 (29.09% of base) : System.Private.Xml.dasm - AttributeSetAction:Merge(AttributeSetAction):this
          10 (28.57% of base) : System.Collections.Immutable.dasm - Node:get_Max():short:this
          10 (28.57% of base) : System.Collections.Immutable.dasm - Node:get_Min():short:this
          19 (27.54% of base) : Microsoft.CSharp.dasm - MethodTypeInferrer:AllFixed():bool:this
Top method improvements (percentages):
         -21 (-31.82% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MergedTypeDeclaration:get_ContainsExtensionMethods():bool:this
         -21 (-31.82% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MergedTypeDeclaration:get_AnyMemberHasAttributes():bool:this
         -20 (-31.75% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceNamedTypeSymbol:GetConstraintKind(ImmutableArray`1):int
         -17 (-26.98% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - BoundInterpolatedStringExpression:get_HasInterpolations():bool:this
         -17 (-26.98% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MergedTypeDeclaration:get_AnyMemberHasAttributes():bool:this
         -29 (-20.71% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodCompiler:AssertAllInitializersAreConstants(ImmutableArray`1):this
        -151 (-18.92% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Imports:LookupExtensionMethodsInUsings(ArrayBuilder`1,String,int,int,Binder):this
         -23 (-17.69% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:MayUseSmallExceptionHeaders(int,ImmutableArray`1):bool
        -197 (-16.95% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MemberLookup:LookupInModules(LookupResult,NamespaceSymbol,String,int,int,Binder,byref)
         -11 (-16.92% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DataFlowPass:DeclareVariables(ImmutableArray`1):this
         -11 (-16.92% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:CheckInterfaceMembers(ImmutableArray`1,DiagnosticBag)
         -11 (-16.18% of base) : Microsoft.CodeAnalysis.CSharp.dasm - AbstractRegionDataFlowPass:MakeSlots(ImmutableArray`1):this
         -11 (-16.18% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - AbstractRegionDataFlowPass:MakeSlots(ImmutableArray`1):this
         -19 (-15.97% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LambdaSymbol:.ctor(VisualBasicSyntaxNode,ImmutableArray`1,TypeSymbol,Binder):this
         -55 (-15.76% of base) : Microsoft.CodeAnalysis.dasm - MetadataVisitor:Visit(ImmutableArray`1):this (5 methods)
         -11 (-15.49% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DataFlowPass:ReportUnusedVariables(ImmutableArray`1):this
         -11 (-15.49% of base) : Microsoft.CodeAnalysis.dasm - AnalysisState:AddToEventsMap_NoLock(ImmutableArray`1):this
         -11 (-15.28% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - EmbeddedSymbolManager:ValidateMethod(MethodSymbol)
         -12 (-15.19% of base) : Microsoft.CodeAnalysis.CSharp.dasm - StackOptimizerPass1:DeclareLocals(ImmutableArray`1,int):this
         -12 (-15.19% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:SerializeCustomModifiers(ImmutableArray`1,BlobBuilder):this
3258 total methods with Code Size differences (1072 improved, 2186 regressed), 205675 unchanged.

@AndyAyersMS
Copy link
Member

Also IsSharedStaticHelper has CORINFO_HELP_BOX in there, which makes no sense to me -- we are looking for helper calls that are potentially hoistable.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants