Iterating with ForEach over ImmutableArray is slower than over Array #780

hnrqbaggio · 2019-12-11T23:38:04Z

This is a spin off issue from https://github.com/dotnet/corefx/issues/36416, to scope down to just the ImmutableArray case.

As mentioned in other issues, the immutability sometimes comes with trade-offs in some operations, but in this case it seems that the extra overhead can be optimized, at least for when it's a collection of value type.

Comparing with Array

Looking at the ASM code generated, the Array version is high inlined, while for the ImmutableArray the JIT is able to inline the loop itself, but not the call to ImmutableArray<T>.GetEnumerator(). The method itself is quite simple, but calls to an internal method called ThrowNullRefIfNotInitialized() to validate that the underlying array is not null.

It seems that the extra method causes the collection to have more branch mis-predictions and cache misses than the array case (the cache misses show up in the results when the collection is larger, say 2048 instead of the default 512 elements).

Possible fix

If we change the implementation of GetEnumerator to use MethodImplOptions.AggressiveInlining, the JIT is able to inline the call and the stats of both benchmarks match and the Median for the Int32 case improves by 4x.

No Slower results for the provided threshold = 3% and noise filter = 0.3ns.

Faster	base/diff	Base Median (ns)	Diff Median (ns)	Modality
IterateForEach.ImmutableArray(Size: 512)	4.07	1110.44	273.14	several?

Unfortunately, this seems to not be enough for the Reference Type case. When the benchmark runs using String instead of Int32 the results are still the same as before, so it seems that marking the methods as inline is still not sufficient for the JIT to optimize them.

Correctness

The code in ThrowNullRefIfNotInitialized is just accessing the underlying Array.Length property and relying on that to throw if the object is null.

In the optimized version, I can't see the exact same instructions, so would still need to confirm that the optimizer is not discarding that check. However, the tests that validate that GetEnumerator will throw NRE in that condition did pass.

I have the changes above in my fork of the repo, but this is my first contribution so would like to check the feedback on the findings and not send a PR right away. 😁

Baseline

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-RUPWLA : .NET Core 5.0.0 (CoreCLR 5.0.19.61001, CoreFX 5.0.19.61001), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Type	Method	Size	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated	BranchInstructions/Op	BranchMispredictions/Op
IterateForEach<Int32>	Array	512	173.4 ns	18.47 ns	21.28 ns	177.2 ns	145.1 ns	199.3 ns	-	-	-	-	212	1
IterateForEach<String>	Array	512	157.5 ns	15.91 ns	18.32 ns	169.4 ns	131.8 ns	179.9 ns	-	-	-	-	205	1
IterateForEach<Int32>	ImmutableArray	512	1,083.9 ns	38.67 ns	44.53 ns	1,110.4 ns	997.1 ns	1,115.4 ns	-	-	-	-	415	2
IterateForEach<String>	ImmutableArray	512	1,101.5 ns	38.80 ns	44.69 ns	1,115.3 ns	1,010.2 ns	1,175.2 ns	-	-	-	-	462	2

Array

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].Array()
       xor     eax,eax
       mov     rdx,qword ptr [rcx+8]
       xor     ecx,ecx
       mov     r8d,dword ptr [rdx+8]
       test    r8d,r8d
       jle     M00_L01
M00_L00:
       movsxd  rax,ecx
       mov     eax,dword ptr [rdx+rax*4+10h]
       inc     ecx
       cmp     r8d,ecx
       jg      M00_L00
M00_L01:
       ret
       sbb     dword ptr [rax],eax
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-76h],dh
       rcr     dword ptr [rdx-6],cl
       jg      M00_L02
M00_L02:
       add     byte ptr [rbp+48h],dl
       mov     ebp,esp
       mov     qword ptr [rbp+10h],rcx
       mov     rax,qword ptr [rbp+10h]
       mov     rax,qword ptr [rax+58h]
; Total bytes of code 64

ImmutableArray

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].ImmutableArray()
       push    rsi
       sub     rsp,40h
       xor     eax,eax
       mov     qword ptr [rsp+38h],rax
       mov     qword ptr [rsp+28h],rax
       mov     qword ptr [rsp+30h],rax
       xor     esi,esi
       mov     rcx,qword ptr [rcx+0C0h]
       mov     qword ptr [rsp+38h],rcx
       lea     rcx,[rsp+38h]
       lea     rdx,[rsp+28h]
       call    System.Collections.Immutable.ImmutableArray`1[[System.Int32, System.Private.CoreLib]].GetEnumerator()
       jmp     M00_L01
M00_L00:
       cmp     dword ptr [rsp+30h],edx
       jae     M00_L02
       mov     rax,qword ptr [rsp+28h]
       mov     edx,dword ptr [rsp+30h]
       movsxd  rdx,edx
       mov     esi,dword ptr [rax+rdx*4+10h]
M00_L01:
       mov     eax,dword ptr [rsp+30h]
       inc     eax
       mov     dword ptr [rsp+30h],eax
       mov     rdx,qword ptr [rsp+28h]
       mov     edx,dword ptr [rdx+8]
       cmp     edx,eax
       jg      M00_L00
       mov     eax,esi
       add     rsp,40h
       pop     rsi
       ret
M00_L02:
       call    CoreCLR!JIT_RngChkFail
       int     3
       add     byte ptr [rcx],bl
       add     eax,72050002h
       add     dword ptr [rax+40h],esp
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-26h],ah
; Total bytes of code 138

; System.Collections.Immutable.ImmutableArray`1[[System.Int32, System.Private.CoreLib]].GetEnumerator()
       push    rbp
       push    rdi
       push    rsi
       sub     rsp,40h
       vzeroupper
       lea     rbp,[rsp+50h]
       xor     eax,eax
       mov     qword ptr [rbp-18h],rax
       mov     qword ptr [rbp-28h],rax
       mov     qword ptr [rbp+10h],rcx
       mov     qword ptr [rbp+18h],rdx
       mov     rcx,qword ptr [rbp+10h]
       mov     rcx,qword ptr [rcx]
       mov     qword ptr [rbp-18h],rcx
       lea     rcx,[rbp-18h]
       call    System.Collections.Immutable.ImmutableArray`1[[System.Int32, System.Private.CoreLib]].ThrowNullRefIfNotInitialized()
       vxorps  xmm0,xmm0,xmm0
       vmovdqu xmmword ptr [rbp-28h],xmm0
       lea     rcx,[rbp-28h]
       mov     rdx,qword ptr [rbp-18h]
       call    System.Collections.Immutable.ImmutableArray`1+Enumerator[[System.Int32, System.Private.CoreLib]]..ctor(Int32[])
       mov     rdi,qword ptr [rbp+18h]
       lea     rsi,[rbp-28h]
       call    CoreCLR!JIT_ByRefWriteBarrier
       movs    qword ptr [rdi],qword ptr [rsi]
       mov     rax,qword ptr [rbp+18h]
       lea     rsp,[rbp-10h]
       pop     rsi
       pop     rdi
       pop     rbp
       ret
       int     3
       int     3
       sbb     dword ptr [rdi],eax
       add     al,0
       ???
       jb      00007ffa`5ad52c12
       ???
       add     dh,byte ptr [rax+1]
       push    rax
       add     byte ptr [rax],al
       add     al,cl
       fcmovbe st,st(4)
       pop     rdx
       cli
       jg      00007ffa`5ad52c1f
; Total bytes of code 127

; System.Collections.Immutable.ImmutableArray`1[[System.Int32, System.Private.CoreLib]].ThrowNullRefIfNotInitialized()
       push    rbp
       mov     rbp,rsp
       mov     qword ptr [rbp+10h],rcx
       mov     rax,qword ptr [rbp+10h]
       mov     rax,qword ptr [rax]
       mov     eax,dword ptr [rax+8]
       pop     rbp
       ret
       sbb     dword ptr [rcx],eax
       add     dword ptr [rax],eax
       add     dword ptr [rax],edx
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       and     bl,bl
       ???
       pop     rdx
       cli
       jg      M02_L00
M02_L00:
       add     byte ptr [rbp+48h],dl
; Total bytes of code 50

With Aggressive Inlining

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-RUPWLA : .NET Core 5.0.0 (CoreCLR 5.0.19.61001, CoreFX 5.0.19.61001), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Type	Method	Size	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated	BranchInstructions/Op	BranchMispredictions/Op
IterateForEach<Int32>	Array	512	156.5 ns	6.94 ns	7.71 ns	154.1 ns	148.4 ns	173.3 ns	-	-	-	-	223	1
IterateForEach<String>	Array	512	140.1 ns	4.23 ns	4.87 ns	138.7 ns	131.3 ns	148.8 ns	-	-	-	-	226	1
IterateForEach<Int32>	ImmutableArray	512	271.6 ns	24.17 ns	27.84 ns	273.1 ns	233.7 ns	301.4 ns	-	-	-	-	424	2
IterateForEach<String>	ImmutableArray	512	1,125.9 ns	107.31 ns	123.58 ns	1,071.1 ns	997.4 ns	1,293.6 ns	-	-	-	-	434	2

Array

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].Array()
       xor     eax,eax
       mov     rdx,qword ptr [rcx+8]
       xor     ecx,ecx
       mov     r8d,dword ptr [rdx+8]
       test    r8d,r8d
       jle     M00_L01
M00_L00:
       movsxd  rax,ecx
       mov     eax,dword ptr [rdx+rax*4+10h]
       inc     ecx
       cmp     r8d,ecx
       jg      M00_L00
M00_L01:
       ret
       sbb     dword ptr [rax],eax
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-76h],dh
       fistp   word ptr [rcx-6]
       jg      M00_L02
M00_L02:
       add     byte ptr [rbp+48h],dl
       mov     ebp,esp
       mov     qword ptr [rbp+10h],rcx
       mov     rax,qword ptr [rbp+10h]
       mov     rax,qword ptr [rax+58h]
; Total bytes of code 64

ImmutableArray

; System.Collections.IterateForEach`1[[System.Int32, System.Private.CoreLib]].ImmutableArray()
       sub     rsp,28h
       xor     eax,eax
       mov     rdx,qword ptr [rcx+0C0h]
       mov     ecx,dword ptr [rdx+8]
       mov     r8d,0FFFFFFFFh
       jmp     M00_L01
M00_L00:
       cmp     r8d,ecx
       jae     M00_L02
       movsxd  rax,r8d
       mov     eax,dword ptr [rdx+rax*4+10h]
M00_L01:
       inc     r8d
       cmp     ecx,r8d
       jg      M00_L00
       add     rsp,28h
       ret
M00_L02:
       call    CoreCLR!JIT_RngChkFail
       int     3
       add     byte ptr [rcx],bl
       add     al,1
       add     byte ptr [rdx+rax*2],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-26h],dl
       loopne  00007ffa`59e12bb5
       cli
       jg      M00_L03
M00_L03:
       add     byte ptr [rbp+48h],dl
; Total bytes of code 82

The text was updated successfully, but these errors were encountered:

hnrqbaggio · 2019-12-11T23:38:55Z

/cc @adamsitnik who opened the original issue about all Immutable collections.

adamsitnik · 2019-12-13T12:37:40Z

Inlining GetEnumerator looks good to me. The code size is going to grow, but I believe that x4 speed improvement is worth it.

I am suprised that ThrowNullRefIfNotInitialized is not getting inlined by default. Are you sure about this? To verify that you can use --profiler ETW and open the trace file with PerfView and go to events tab. There should be events for inlining succeeded and failed. You might find this blog post useful.

Where most of the time is spent for the string case? You can use one of the recommended profilers to find out: https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-corefx.md

Also, I wonder what perf we would get if GetEnumerator was implemented as return this.array.GetEnumerator()?

hnrqbaggio · 2019-12-13T20:41:17Z

Thanks for the pointer @adamsitnik. I didn't know I could use PerfView for the Inline Events and was having trouble with the InliningDiagnoser because it emits the output to the screen it was too much info.

Looking now, it seems that the JIT is successfully inlining the ctor and ThrowNullRefIfNotInitialized inside GetEnumerator but then on the caller it says that GetEnumerator is not profitable:

ThreadID="29,572" ProcessorNumber="2" MethodBeingCompiledNamespace="System.Collections.IterateForEach_1[System.Int32]" MethodBeingCompiledName="ImmutableArray" MethodBeingCompiledNameSignature="instance !0 ()" InlinerNamespace="System.Collections.IterateForEach_1[System.Int32]" InlinerName="ImmutableArray" InlinerNameSignature="instance !0 ()" InlineeNamespace="System.Collections.Immutable.ImmutableArray_1[System.Int32]" InlineeName="GetEnumerator" InlineeNameSignature="instance value class Enumerator<!0> ()" FailAlways="False" FailReason="unprofitable inline" ClrInstanceID="7"

I've tried to do some research on what that reason is, but could not find much data. Maybe the JIT is considering that MoveNext and GetCurrent are more important to inline than the single call to GetEnumerator?

About the String case

I've tried looking at the ETL traces when I started this investigation, but for some reason I'm not able to see anything inside the benchmark method itself, because it's grayed out in PerfView and can't expand further.

This is what I get in the call stacks. Even if I load all symbols for the modules that are with ? in there the leaf node doesn't change. I've tried to follow the posts on your blog and the tutorial for PerfView but wasn't able to fix the problem (even in WPA I see the same pattern, I'll see if I can use Visual Studio and get something).

The JIT Events for that case show the same pattern than for Int32: the sub-methods seem to be successfully selected for inlining, but at GetEnumerator the JIT decides that's not worth it.

Just use this.array.GetEnumerator

Finally, at some point I did test just forwarding the call to this.array.GetEnumerator, but that didn't seem to be enough to optimize the flow. It also changes the exceptions that the methods can return (because the are some checks in MoveNext and GetCurrent that would not apply anymore) so it would require adjusting some unit tests and maybe a review if that's a breaking change or not.

But I still might give that another try at least to use as a base to compare the JIT behavior.

hnrqbaggio · 2019-12-20T11:03:16Z

Got some interesting information after the suggestions.

Why inlining GetEnumerator helps

I was curious why a method that is called only once and it's not that heavy would make such a difference in the results. In the VS Profiler, it shows that most of the time is spent on the assignment inside the loop, and the assembly shows the difference in terms of the instructions executed.

The assembly snips I've posted earlier show that the loop for Array or for the Inlined case is very small, and works on registers only for both the array address in RDX and the loop index variable in RAX.

The loop in the baseline shows that the address of the array and the loop variable are loaded from the stack and stored back on every iteration. It seems that the overhead of these extra memory access compounds during the loop, which is reasonable even if if using the L1 cache (it also might explain why that case has more cache misses).

I'm not sure if this is expected or not, but it seems that the non-inlined function blocks the JIT from seeing that the array can be accessed directly from the registers from earlier in the code and optimize the calls. Once we remove that "barrier" the code gets optimized.

Code size

It also seems that inlining also helps with the code size, and the reported number of bites for the benchmark is smaller, because the extra memory operations mentioned above get optimized away.

Benchmark for String

After testing a few variants of the benchmark to understand the differences in the code, I've noticed that the JIT is able to inline GetEnumerable if the benchmark is not inside a generic class itself. By default, the code in the Performance repo uses a generic class with GenericTypeArgumentsAttribute to test both Int32 and String.

In PerfView, the JIT Inlining events report a different reason for not optimizing the String case when my fix is on: before it was due to an "Unprofitable Inlining" and now it reports a "Runtime Dictionary Lookup".

Below is the result a similar benchmark that doesn't use a generic class and just the target type directly. The times drop to the same range as the Int32 case. So it seems that the proposed fix would benefit both cases.

@adamsitnik is this something that you've encountered before when working with the benchmarks?

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-RUPWLA : .NET Core 5.0.0 (CoreCLR 5.0.19.61901, CoreFX 5.0.19.61901), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Type	Method	Size	Mean	Error	StdDev	Median	Min	Max	Gen 0	Gen 1	Gen 2	Allocated	BranchInstructions/Op	BranchMispredictions/Op	CacheMisses/Op
IterateForEach<Int32>	Array	512	369.9 ns	15.74 ns	16.84 ns	364.3 ns	351.5 ns	404.8 ns	-	-	-	-	519	1	0
IterateForEach<String>	Array	512	305.1 ns	10.97 ns	12.63 ns	304.9 ns	288.7 ns	334.0 ns	-	-	-	-	518	1	0
IterateForEach_Int32	Array	512	288.5 ns	25.03 ns	28.83 ns	272.9 ns	254.4 ns	331.7 ns	-	-	-	-	517	1	0
IterateForEach_String	Array	512	317.7 ns	25.76 ns	29.67 ns	321.8 ns	251.4 ns	371.4 ns	-	-	-	-	517	1	0
IterateForEach<Int32>	ImmutableArray	512	396.7 ns	9.58 ns	10.25 ns	395.1 ns	378.2 ns	419.2 ns	-	-	-	-	1,031	1	0
IterateForEach<String>	ImmutableArray	512	1,531.5 ns	91.32 ns	93.78 ns	1,514.5 ns	1,439.6 ns	1,823.3 ns	-	-	-	-	1,059	2	1
IterateForEach_Int32	ImmutableArray	512	314.8 ns	28.15 ns	32.42 ns	303.5 ns	270.7 ns	359.0 ns	-	-	-	-	1,030	1	0
IterateForEach_String	ImmutableArray	512	421.9 ns	8.41 ns	9.35 ns	423.7 ns	408.8 ns	437.3 ns	-	-	-	-	1,031	1	0

; System.Collections.IterateForEach_String.ImmutableArray()
       sub     rsp,28h
       xor     eax,eax
       mov     rdx,qword ptr [rcx+0C0h]
       mov     ecx,dword ptr [rdx+8]
       mov     r8d,0FFFFFFFFh
       jmp     M00_L01
M00_L00:
       cmp     r8d,ecx
       jae     M00_L02
       movsxd  rax,r8d
       mov     rax,qword ptr [rdx+rax*8+10h]
M00_L01:
       inc     r8d
       cmp     ecx,r8d
       jg      M00_L00
       add     rsp,28h
       ret
M00_L02:
       call    CoreCLR!JIT_RngChkFail
       int     3
       sbb     dword ptr [rcx+rax],eax
       add     byte ptr [rdx+rax*2],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax],al
       add     byte ptr [rax-0Ah],bl
       and     dh,byte ptr [rdi]
       ???
       jg      M00_L03
M00_L03:
       add     byte ptr [rbp+48h],dl
; Total bytes of code 82

danmoseley · 2019-12-21T00:00:41Z

cc @AndyAyersMS for observations about inlining above

AndyAyersMS · 2019-12-26T20:27:40Z

@hnrqbaggio: thanks for the observations!

For the int case, the GetEnumerator method just misses being inlined by default:

multiplier in methods of promotable struct increased to 3.
Inline candidate callsite is boring.  Multiplier increased to 4.3.
calleeNativeSizeEstimate=461
callsiteNativeSizeEstimate=85
benefit multiplier=4.3
threshold=365
Native estimate for function size exceeds threshold for inlining 46.1 > 36.5 (multiplier = 4.3)
INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline' for 'X`1[Int32][System.Int32]:IA():int:this' calling 'System.Collections.Immutable.ImmutableArray`1[Int32][System.Int32]:GetEnumerator():Enumerator[Int32]:this'

We might consider boosting the "promotable struct" multiplier somewhat to give this sort of inline an extra nudge in the jit, since we get a lot of benefit out of promotion. I'll put this on my todo list.

For the string case: the jit is unable to inline a method from one shared generic class into a method from another shared generic class. So when benchmarking a shared generic method, the calling context matters quite a bit.

We can sometimes work around this if the method being inlined doesn't actually use the results of the runtime lookup. Let me dig in and see if that's the case here.

I am also looking into the inlined int case, seems like we ought to be able to match the array version codegen but can't remove the bounds check.

AndyAyersMS · 2019-12-26T20:51:34Z

In the inlined int case, the jit can't perform what we call a "do-while" transformation on the loop, because the loop exit block has multiple statements. Basically the code (after inlining) resembles this:

    static int F(int[] a)
    {
        int r = 0;
        for (int i = -1; ++i < a.Length; )
        {
            r = a[i];
        }
        return r;
    }

And without this transformation the jit won't optimize out the bounds check.

This is limitation is going to hold for any sort of inlined enumerator because MoveNext must do two things: update some internal state and perform a test. We ought to look at relaxing this constraint and be willing to let the jit duplicate more statements if the state update and test are sufficiently cheap.

AndyAyersMS · 2019-12-26T23:14:50Z

As for the "inlined" string case -- the inline currently fails when inlining GetEnumerator once the importer reaches the call to ThrowNullRefIfNotInitialized, because that method's signature indicates that it requires an generic context parameter, and that parameter requires a runtime lookup, and there's currently no way to do runtime lookups safely, unless we're in the root method.

It turns out that ThrowNullRefIfNotInitialized does not use its generic context, but the jit doesn't have any way of figuring that out before it has to decide whether or not to allow GetEnumerator to inline. And even if we could get past that, we'd hit the same issue on the Enumerator constructor call.

AndyAyersMS · 2019-12-26T23:16:21Z

So, areas for codegen follow-up:

look at boosting inline mulitplier for promotable structs
look at allowing multiple statements in the do-while transformation

danmoseley · 2019-12-27T00:01:36Z

Is the upshot that until those changes, it is worth force inlining it?

AndyAyersMS · 2019-12-27T00:56:10Z

Yes, I think adding forceinline here is reasonable. Seems like whoever wrote the code was expecting inlining to happen...

runtime/src/libraries/System.Collections.Immutable/src/System/Collections/Immutable/ImmutableArray_1.Enumerator.cs

Lines 15 to 20 in bc47321

    
           /// It is important that this enumerator does NOT implement <see cref="IDisposable"/>. 
        
           /// We want the iterator to inline when we do foreach and to not result in 
        
           /// a try/finally frame in the client. 
        
           /// </remarks> 
        
           public struct Enumerator 
        
           {

danmoseley · 2019-12-27T02:01:41Z

@hnrqbaggi do you want to offer a PR?

hnrqbaggio · 2019-12-27T02:51:57Z

Yes, I should have one ready soon.
Thanks for the follow-up!

AndyAyersMS · 2019-12-27T18:39:50Z

As a follow-up: the promotable struct benefit in the inliner needs to be at least 5.5 (currently 3) for this case to be handled by default. Not surprisingly this has fairly widespread impact.

It's hard to be surgical when changing the inlining heuristics. Lots of good diffs, lots of bad diffs.

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: 238198 (0.58% of base)
    diff is a regression.
Top file regressions (bytes):
       41733 : System.Private.CoreLib.dasm (0.91% of base)
       39944 : System.Private.Xml.dasm (1.12% of base)
       18091 : System.Collections.Immutable.dasm (1.65% of base)
       17741 : System.Net.Http.dasm (2.88% of base)
       17337 : System.Data.Common.dasm (1.16% of base)
       15682 : Microsoft.CodeAnalysis.dasm (0.89% of base)
       12618 : Newtonsoft.Json.dasm (1.46% of base)
       12616 : NuGet.Protocol.Core.v3.dasm (4.67% of base)
        4562 : System.Drawing.Primitives.dasm (11.94% of base)
        4110 : System.Private.DataContractSerialization.dasm (0.54% of base)
        3639 : System.Threading.Tasks.Dataflow.dasm (0.44% of base)
        2928 : System.ComponentModel.TypeConverter.dasm (1.09% of base)
        2812 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.05% of base)
        2516 : System.Security.Cryptography.X509Certificates.dasm (1.50% of base)
        2377 : System.Reflection.Metadata.dasm (0.56% of base)
        2135 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.07% of base)
        2128 : System.IO.Compression.dasm (2.80% of base)
        2105 : System.IO.Pipes.dasm (5.39% of base)
        2031 : System.Runtime.Extensions.dasm (2.72% of base)
        1934 : System.Private.Xml.Linq.dasm (1.28% of base)
Top file improvements (bytes):
        -694 : Microsoft.CodeAnalysis.CSharp.dasm (-0.02% of base)
         -38 : NuGet.Configuration.dasm (-0.07% of base)
         -16 : CommandLine.dasm (-0.00% of base)
          -7 : System.Text.RegularExpressions.dasm (-0.00% of base)
          -5 : System.Linq.Parallel.dasm (-0.00% of base)
76 total files with Code Size differences (5 improved, 71 regressed), 53 unchanged.
Top method regressions (bytes):
        2739 (17.08% of base) : System.Collections.Immutable.dasm - Enumerator:MoveNext():bool:this (77 methods)
        2502 (23.60% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:DecodeModifiers(SyntaxTokenList,int,int,int,DiagnosticBag):MemberModifiers:this
        2440 (12.75% of base) : NuGet.Protocol.Core.v3.dasm - <TryCreate>d__1:MoveNext():this (20 methods)
        2113 (126.60% of base) : System.Private.CoreLib.dasm - ConfiguredValueTaskAwaiter:System.Runtime.CompilerServices.IStateMachineBoxAwareAwaiter.AwaitUnsafeOnCompleted(IAsyncStateMachineBox):this (8 methods)
        1852 (134.11% of base) : System.Private.CoreLib.dasm - ValueTaskAwaiter`1:System.Runtime.CompilerServices.IStateMachineBoxAwareAwaiter.AwaitUnsafeOnCompleted(IAsyncStateMachineBox):this (7 methods)
        1723 (99.42% of base) : Microsoft.CodeAnalysis.dasm - SyntaxDiffer:GetSimilarity(SyntaxNodeOrToken,SyntaxNodeOrToken):int:this
        1610 (12.77% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodToClassRewriter`1:RewriteBlock(BoundBlock,ArrayBuilder`1,ArrayBuilder`1):BoundBlock:this (7 methods)
        1610 (33.67% of base) : System.Collections.Immutable.dasm - <get_Keys>d__25:MoveNext():bool:this (7 methods)
        1610 (33.77% of base) : System.Collections.Immutable.dasm - <get_Values>d__27:MoveNext():bool:this (7 methods)
        1594 (25.40% of base) : System.Net.Http.dasm - <WaitWithCancellationAsync>d__3:MoveNext():this (7 methods)
        1584 (27.86% of base) : System.Collections.Immutable.dasm - Enumerator:Reset():this (56 methods)
        1338 (46.41% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Parser:ParseIfExpression():ExpressionSyntax:this
        1219 (91.93% of base) : Microsoft.CodeAnalysis.dasm - SyntaxDiffer:FindBestMatch(Stack`1,SyntaxNodeOrToken,byref,byref,int):this
        1211 (14.63% of base) : NuGet.Protocol.Core.v3.dasm - <ProcessStreamAsync>d__25`1:MoveNext():this (7 methods)
        1200 (17.50% of base) : System.Private.Xml.dasm - <WriteEndAttributeAsync_SepcialAtt>d__134:MoveNext():this
        1156 ( 8.81% of base) : NuGet.Protocol.Core.v3.dasm - <StartWithTimeout>d__0`1:MoveNext():this (7 methods)
        1148 (180.22% of base) : System.IO.Pipes.dasm - PipeCompletionSource`1:ReleaseResources():this (7 methods)
        1045 (45.51% of base) : System.Collections.Immutable.dasm - <get_Values>d__26:MoveNext():bool:this (7 methods)
         945 (37.97% of base) : System.ComponentModel.TypeConverter.dasm - ColorConverter:ConvertTo(ITypeDescriptorContext,CultureInfo,Object,Type):Object:this
         940 (116.92% of base) : System.Threading.Tasks.Dataflow.dasm - <>c:<.cctor>b__16_0(Task`1,Object):bool:this (7 methods)
Top method improvements (bytes):
        -747 (-4.64% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(VisualBasicSyntaxNode,Location,int,byref,ImmutableArray`1,ImmutableArray`1,bool,bool,bool,bool,DiagnosticBag,Symbol,bool,VisualBasicSyntaxNode,Symbol):this
        -570 (-9.35% of base) : System.Security.Cryptography.X509Certificates.dasm - AsnWriter:WriteUtcTimeCore(Asn1Tag,DateTimeOffset):this
        -406 (-24.37% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineRewriter`1:EnsureAllSymbolsAndSignature():bool:this (7 methods)
        -387 (-18.90% of base) : Microsoft.CodeAnalysis.dasm - ImmutableArrayExtensions:HasDuplicates(ImmutableArray`1,IEqualityComparer`1):bool (7 methods)
        -384 (-11.94% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:ExecuteDeclaringReferenceActions(SymbolDeclaredCompilationEvent,AnalysisScope,AnalysisState,CancellationToken):this (6 methods)
        -372 (-26.38% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:ShouldExecuteCodeBlockActions(AnalysisScope,ISymbol):bool:this (6 methods)
        -350 (-11.61% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineMethodToClassRewriter:PossibleStateMachineScope(ImmutableArray`1,BoundNode):BoundNode:this (7 methods)
        -336 (-23.81% of base) : Microsoft.CodeAnalysis.dasm - UnionCollection`1:CopyTo(ref,int):this (7 methods)
        -324 (-32.53% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:ShouldExecuteSyntaxNodeActions(AnalysisScope):bool:this (6 methods)
        -318 (-34.53% of base) : System.Private.CoreLib.dasm - ArraySegment`1:GetEnumerator():Enumerator:this (7 methods)
        -294 (-37.31% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - ControlFlowPass:VisitFinallyBlock(BoundStatement,byref):this
        -293 (-21.85% of base) : Microsoft.CodeAnalysis.dasm - Hash:CombineValues(ImmutableArray`1,int):int (7 methods)
        -288 (-19.38% of base) : System.Collections.Immutable.dasm - ImmutableArrayExtensions:ToDictionary(ImmutableArray`1,Func`2,IEqualityComparer`1):Dictionary`2 (7 methods)
        -269 (-3.95% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineRewriter`1:GenerateKickoffMethodBody():BoundBlock:this (7 methods)
        -268 (-7.76% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VBSemanticModel:GetSemanticSymbols(BoundNodeSummary,Binder,int,byref,byref):ImmutableArray`1:this
        -260 (-42.69% of base) : Microsoft.CodeAnalysis.dasm - MetadataVisitor:Visit(ImmutableArray`1):this (5 methods)
        -252 (-2.09% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerDriver`1:ExecuteDeclaringReferenceActions(SyntaxReference,SymbolDeclaredCompilationEvent,AnalysisScope,AnalysisState,bool,bool,CancellationToken):this (6 methods)
        -245 (-9.66% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LambdaRewriter:IntroduceFrame(BoundNode,LambdaFrame,Func`3,LambdaSymbol):BoundNode:this
        -243 (-15.99% of base) : Microsoft.CodeAnalysis.dasm - AnalyzerExecutor:ExecuteSyntaxNodeActions(SyntaxNode,IDictionary`2,SemanticModel,Func`2,Action`1,SyntaxNodeAnalyzerStateData):this (6 methods)
        -228 (-8.83% of base) : Microsoft.CodeAnalysis.dasm - ImmutableArrayExtensions:Distinct(ImmutableArray`1,IEqualityComparer`1):ImmutableArray`1 (7 methods)
Top method regressions (percentages):
         330 (6,600.00% of base) : Microsoft.CodeAnalysis.dasm - SyntaxNodeOrToken:WithLeadingTrivia(ref):SyntaxNodeOrToken:this
         330 (6,600.00% of base) : Microsoft.CodeAnalysis.dasm - SyntaxNodeOrToken:WithTrailingTrivia(ref):SyntaxNodeOrToken:this
         298 (5,960.00% of base) : System.Private.CoreLib.dasm - RuntimeMethodInfo:GetGenericArgumentsInternal():ref:this
         204 (1,854.55% of base) : System.Net.Http.dasm - SafeDeleteContext:ToString():String:this
         204 (1,854.55% of base) : System.Net.HttpListener.dasm - SafeDeleteContext:ToString():String:this
         204 (1,854.55% of base) : System.Net.Mail.dasm - SafeDeleteContext:ToString():String:this
         204 (1,854.55% of base) : System.Net.Security.dasm - SafeDeleteContext:ToString():String:this
         803 (1,825.00% of base) : System.Private.CoreLib.dasm - ValueTuple`2:System.Collections.IStructuralEquatable.GetHashCode(IEqualityComparer):int:this (7 methods)
         803 (1,825.00% of base) : System.Private.CoreLib.dasm - ValueTuple`2:System.IValueTupleInternal.GetHashCode(IEqualityComparer):int:this (7 methods)
          81 (1,620.00% of base) : Microsoft.CodeAnalysis.dasm - Blobs:MoveNext():bool:this
          81 (1,620.00% of base) : System.Reflection.Metadata.dasm - Blobs:MoveNext():bool:this
         533 (1,066.00% of base) : Microsoft.CodeAnalysis.dasm - TwoEnumeratorListStack:TryGetNextInSpan(byref,byref):bool:this
         117 (1,063.64% of base) : System.Private.CoreLib.dasm - ModuleHandle:ResolveMethodHandleInternal(RuntimeModule,int):IRuntimeMethodInfo
         533 (1,045.10% of base) : Microsoft.CodeAnalysis.dasm - ThreeEnumeratorListStack:TryGetNextInSpan(byref,byref):bool:this
         140 (933.33% of base) : Microsoft.CodeAnalysis.dasm - MetadataReferenceProperties:op_Equality(MetadataReferenceProperties,MetadataReferenceProperties):bool
         167 (927.78% of base) : System.Collections.Immutable.dasm - ImmutableArray`1:Replace(long,long):ImmutableArray`1:this
         292 (912.50% of base) : System.Private.CoreLib.dasm - RuntimeMethodInfo:GetGenericArguments():ref:this
         100 (909.09% of base) : System.Private.CoreLib.dasm - EventRegistrationTokenListWithCount:Pop(byref):bool:this
         104 (866.67% of base) : System.Data.Common.dasm - SqlSingle:.ctor(double):this
          40 (800.00% of base) : System.Private.CoreLib.dasm - RuntimeType:IsPrimitiveImpl():bool:this
Top method improvements (percentages):
         -86 (-66.67% of base) : System.IO.FileSystem.dasm - DisableMediaInsertionPrompt:Create():DisableMediaInsertionPrompt (3 base, 1 diff methods)
         -65 (-50.78% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - BoundInterpolatedStringExpression:get_HasInterpolations():bool:this
         -65 (-50.78% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MergedTypeDeclaration:get_AnyMemberHasAttributes():bool:this
         -16 (-50.00% of base) : System.Private.CoreLib.dasm - RuntimeTypeHandle:get_Value():long:this (2 base, 1 diff methods)
         -40 (-50.00% of base) : System.Private.CoreLib.dasm - GCHandle:Free():this (2 base, 1 diff methods)
         -65 (-49.62% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MergedTypeDeclaration:get_ContainsExtensionMethods():bool:this
         -65 (-49.62% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MergedTypeDeclaration:get_AnyMemberHasAttributes():bool:this
        -210 (-47.95% of base) : System.Private.CoreLib.dasm - ModuleHandle:GetModuleType(RuntimeModule):RuntimeType (2 base, 1 diff methods)
         -74 (-47.13% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - ConstructedType:GetHashCode():int:this
        -110 (-47.01% of base) : Microsoft.CodeAnalysis.CSharp.dasm - CodeGenerator:EmitSwitchSection(BoundSwitchSection):this
         -60 (-44.78% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - CompilationMergedNamespaceSymbol:BuildExtensionMethodsMap(Dictionary`2):this
         -52 (-44.44% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DataFlowPass:DeclareVariables(ImmutableArray`1):this
         -52 (-44.44% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:CheckInterfaceMembers(ImmutableArray`1,DiagnosticBag)
         -59 (-44.36% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - AsyncMethodToClassRewriter:NeedsSpill(ImmutableArray`1):bool
         -57 (-44.19% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - EmbeddedSymbolManager:ValidateMethod(MethodSymbol)
         -49 (-43.75% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceNamedTypeSymbol:GetConstraintKind(ImmutableArray`1):int
         -52 (-43.33% of base) : Microsoft.CodeAnalysis.CSharp.dasm - AbstractRegionDataFlowPass:MakeSlots(ImmutableArray`1):this
         -52 (-43.33% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - AbstractRegionDataFlowPass:MakeSlots(ImmutableArray`1):this
        -260 (-42.69% of base) : Microsoft.CodeAnalysis.dasm - MetadataVisitor:Visit(ImmutableArray`1):this (5 methods)
         -52 (-42.28% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DataFlowPass:ReportUnusedVariables(ImmutableArray`1):this
3782 total methods with Code Size differences (1148 improved, 2634 regressed), 205151 unchanged.

Motivated in part by dotnet#780.

AndyAyersMS · 2019-12-28T00:52:40Z

Second follow-up: Have a prototype master...AndyAyersMS:FgOptWhileLoop that fgOptWhileLoop to handle multiple statements.

This plus the change from #1183 gives identical codegen for the int array and immutable array cases.

Overall diffs look plausible too; still chasing through the regressions to see what's up, but so far:

tree size costing looks iffy in some cases( eg Node:BalanceMany():Node:this)
we lose some CSEs, if the code we're duplicating has a CSE candidate (eg Node:get_Max():long:this)
old code was not costing the JTRUE node, just the compare underneath, so we may lose a few cases from slightly higher costs

Total bytes of diff: 1150 (0.00% of base)
    diff is a regression.
Top file regressions (bytes):
        7716 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (0.25% of base)
        2849 : System.Private.Xml.dasm (0.08% of base)
        2060 : System.Data.Common.dasm (0.14% of base)
        1262 : System.Private.CoreLib.dasm (0.03% of base)
         923 : System.Linq.Parallel.dasm (0.06% of base)
         700 : Microsoft.CSharp.dasm (0.23% of base)
         551 : System.Collections.Immutable.dasm (0.05% of base)
         494 : System.Linq.dasm (0.05% of base)
         388 : System.Memory.dasm (0.16% of base)
         357 : System.Private.DataContractSerialization.dasm (0.05% of base)
         352 : System.Text.RegularExpressions.dasm (0.14% of base)
         337 : System.Diagnostics.TraceSource.dasm (0.80% of base)
         290 : System.Security.Cryptography.Algorithms.dasm (0.10% of base)
         178 : Newtonsoft.Json.dasm (0.02% of base)
         155 : System.IO.Compression.dasm (0.20% of base)
         153 : System.Security.Cryptography.Cng.dasm (0.10% of base)
         150 : System.Linq.Expressions.dasm (0.02% of base)
         120 : System.Runtime.Numerics.dasm (0.17% of base)
         109 : System.Security.Claims.dasm (0.47% of base)
         104 : System.Net.Http.dasm (0.02% of base)
Top file improvements (bytes):
      -10344 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.19% of base)
       -7431 : Microsoft.CodeAnalysis.CSharp.dasm (-0.17% of base)
       -1501 : Microsoft.CodeAnalysis.dasm (-0.09% of base)
          -7 : System.Threading.Tasks.Dataflow.dasm (-0.00% of base)
53 total files with Code Size differences (4 improved, 49 regressed), 76 unchanged.
Top method regressions (bytes):
         526 ( 1.26% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:ToDictionary(Func`2,IEqualityComparer`1):Dictionary`2:this (49 methods)
         516 (12.29% of base) : System.Data.Common.dasm - DataTable:SerializeTableSchema(SerializationInfo,StreamingContext,bool):this
         315 (18.67% of base) : System.Collections.Immutable.dasm - Node:BalanceMany():Node:this (7 methods)
         302 ( 3.04% of base) : System.Linq.Parallel.dasm - TakeOrSkipQueryOperatorEnumerator`1:MoveNext(byref,byref):bool:this (7 methods)
         223 (12.57% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:RemoveDuplicates():this (7 methods)
         155 ( 6.18% of base) : System.Linq.Parallel.dasm - OrderedPipeliningMergeEnumerator:MoveNext():bool:this (7 methods)
         139 ( 5.65% of base) : System.Security.Cryptography.Algorithms.dasm - AsnReader:CopyConstructedOctetString(ReadOnlyMemory`1,Span`1,bool,bool,byref):int:this
         139 ( 5.65% of base) : System.Security.Cryptography.Cng.dasm - AsnReader:CopyConstructedOctetString(ReadOnlyMemory`1,Span`1,bool,bool,byref):int:this
         133 ( 1.88% of base) : System.Linq.Parallel.dasm - AsynchronousChannelMergeEnumerator`1:MoveNextSlowPath():bool:this (7 methods)
         126 (13.18% of base) : System.Linq.Parallel.dasm - <System-Collections-Generic-IEnumerable<T>-GetEnumerator>d__21:MoveNext():bool:this (7 methods)
         119 (32.69% of base) : System.Private.CoreLib.dasm - TaskCompletionSource`1:SpinUntilCompleted():this (7 methods)
         116 ( 0.96% of base) : System.Linq.Parallel.dasm - TakeOrSkipWhileQueryOperatorEnumerator`1:MoveNext(byref,byref):bool:this (7 methods)
         107 ( 4.10% of base) : System.Security.Cryptography.Algorithms.dasm - AsnReader:ProcessConstructedBitString(ReadOnlyMemory`1,Span`1,BitStringCopyAction,bool,byref,byref):int:this
         106 ( 0.65% of base) : System.Private.Xml.dasm - XmlReflectionImporter:ImportAccessorMapping(MemberMapping,FieldModel,XmlAttributes,String,Type,bool,bool,RecursionLimiter):this
         105 ( 1.32% of base) : System.Private.Xml.dasm - SchemaCollectionPreprocessor:Preprocess(XmlSchema,String,int):this
         100 (11.43% of base) : System.Private.Xml.dasm - NamespaceEnumerator:MoveNext():bool:this (7 methods)
          98 ( 5.65% of base) : System.Private.Xml.dasm - <GetActiveRecords>d__34:MoveNext():bool:this (7 methods)
          92 ( 8.25% of base) : System.IO.Compression.dasm - <WriteAsync>d__8:MoveNext():this
          91 ( 3.66% of base) : System.Linq.Parallel.dasm - <>c__DisplayClass1_0`2:<SpoolPipeline>b__0():this (7 methods)
          90 ( 3.89% of base) : System.Linq.dasm - SparseArrayBuilder`1:CopyTo(ref,int,int):this (7 methods)
Top method improvements (bytes):
        -868 (-6.11% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodToClassRewriter`1:RewriteBlock(BoundBlock,ArrayBuilder`1,ArrayBuilder`1):BoundBlock:this (7 methods)
        -370 (-10.45% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:ComputeInterfaceImplementations(DiagnosticBag,CancellationToken):ImmutableArray`1:this
        -298 (-12.27% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceNamedTypeSymbol:MakeDeclaredBases(ConsList`1,DiagnosticBag):Tuple`2:this
        -261 (-10.26% of base) : Microsoft.CodeAnalysis.CSharp.dasm - ConversionsBase:ComputeApplicableUserDefinedImplicitConversionSet(BoundExpression,TypeSymbol,TypeSymbol,ArrayBuilder`1,ArrayBuilder`1,byref,bool):this
        -246 (-12.31% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceMemberContainerTypeSymbol:ProcessPartialMethodsIfAny(Dictionary`2,DiagnosticBag):this
        -238 (-9.87% of base) : Microsoft.CodeAnalysis.CSharp.dasm - ConversionsBase:AddUserDefinedConversionsToExplicitCandidateSet(BoundExpression,TypeSymbol,TypeSymbol,ArrayBuilder`1,NamedTypeSymbol,String,byref):this
        -229 (-12.51% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MetadataDecoder:SubstituteNoPiaLocalType(byref,bool,TypeSymbol,String,String,String,AssemblySymbol):NamedTypeSymbol
        -228 (-13.14% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MetadataDecoder:SubstituteNoPiaLocalType(byref,bool,TypeSymbol,String,String,String,AssemblySymbol):NamedTypeSymbol
        -203 (-15.04% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - VisualBasicCompilation:GetRuntimeMember(NamedTypeSymbol,byref,SignatureComparer`5,AssemblySymbol):Symbol
        -200 (-3.06% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - StateMachineRewriter`1:GenerateKickoffMethodBody():BoundBlock:this (7 methods)
        -197 (-16.95% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MemberLookup:LookupInModules(LookupResult,NamespaceSymbol,String,int,int,Binder,byref)
        -182 (-13.79% of base) : Microsoft.CodeAnalysis.CSharp.dasm - OverriddenOrHiddenMembersHelpers:FindOverriddenOrHiddenMembersInType(Symbol,bool,NamedTypeSymbol,NamedTypeSymbol,byref,byref,byref)
        -176 (-13.75% of base) : Microsoft.CodeAnalysis.CSharp.dasm - CSharpCompilation:GetRuntimeMember(NamedTypeSymbol,byref,SignatureComparer`5,AssemblySymbol):Symbol
        -173 (-3.43% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceNamedTypeSymbol:CheckDeclarationNameAndTypeParameters(VisualBasicSyntaxNode,Binder,DiagnosticBag,byref):this
        -157 (-7.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LocalRewriter:RewriteWithBlockStatements(BoundBlock,VisualBasicSyntaxNode,bool,ImmutableArray`1,ImmutableArray`1,BoundValuePlaceholderBase,BoundExpression):BoundBlock:this
        -156 (-12.61% of base) : Microsoft.CodeAnalysis.dasm - Parser:GetMatchingMethods(String,byref,List`1,String,int,Compilation,List`1)
        -155 (-4.34% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - InitializerRewriter:BuildConstructorBody(TypeCompilationState,MethodSymbol,BoundStatement,ProcessedFieldOrPropertyInitializers,BoundBlock):BoundBlock
        -154 (-2.17% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodToClassRewriter`1:RewriteSequence(BoundSequence,ArrayBuilder`1,ArrayBuilder`1):BoundSequence:this (7 methods)
        -151 (-18.92% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Imports:LookupExtensionMethodsInUsings(ArrayBuilder`1,String,int,int,Binder):this
        -147 (-7.55% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:CheckMemberNameConflicts(DiagnosticBag):this
Top method regressions (percentages):
           8 (40.00% of base) : System.Runtime.Numerics.dasm - Number:wcslen(long):int
          30 (35.71% of base) : System.Private.Xml.dasm - ParticleContentValidator:CheckUniqueParticleAttribution(BitSet,ref):this
         119 (32.69% of base) : System.Private.CoreLib.dasm - TaskCompletionSource`1:SpinUntilCompleted():this (7 methods)
           8 (32.00% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEvent:SkipUnicodeString(int):int:this
           7 (31.82% of base) : Microsoft.CodeAnalysis.dasm - ComMemoryStream:ZeroMemory(long,int)
           7 (31.82% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEvent:SkipUTF8String(int):int:this
          10 (30.30% of base) : System.Collections.Immutable.dasm - Node:get_Max():int:this
          10 (30.30% of base) : System.Collections.Immutable.dasm - Node:get_Min():int:this
          22 (30.14% of base) : System.Data.Common.dasm - Merger:MergeConstraints(DataSet):this
          22 (29.73% of base) : System.Data.Common.dasm - DSRowDiffIdUsageSection:Prepare(DataSet):this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Max():__Canon:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Min():__Canon:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Max():ubyte:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Min():ubyte:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Max():long:this
          10 (29.41% of base) : System.Collections.Immutable.dasm - Node:get_Min():long:this
          16 (29.09% of base) : System.Private.Xml.dasm - AttributeSetAction:Merge(AttributeSetAction):this
          10 (28.57% of base) : System.Collections.Immutable.dasm - Node:get_Max():short:this
          10 (28.57% of base) : System.Collections.Immutable.dasm - Node:get_Min():short:this
          19 (27.54% of base) : Microsoft.CSharp.dasm - MethodTypeInferrer:AllFixed():bool:this
Top method improvements (percentages):
         -21 (-31.82% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MergedTypeDeclaration:get_ContainsExtensionMethods():bool:this
         -21 (-31.82% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MergedTypeDeclaration:get_AnyMemberHasAttributes():bool:this
         -20 (-31.75% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SourceNamedTypeSymbol:GetConstraintKind(ImmutableArray`1):int
         -17 (-26.98% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - BoundInterpolatedStringExpression:get_HasInterpolations():bool:this
         -17 (-26.98% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MergedTypeDeclaration:get_AnyMemberHasAttributes():bool:this
         -29 (-20.71% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodCompiler:AssertAllInitializersAreConstants(ImmutableArray`1):this
        -151 (-18.92% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Imports:LookupExtensionMethodsInUsings(ArrayBuilder`1,String,int,int,Binder):this
         -23 (-17.69% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:MayUseSmallExceptionHeaders(int,ImmutableArray`1):bool
        -197 (-16.95% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MemberLookup:LookupInModules(LookupResult,NamespaceSymbol,String,int,int,Binder,byref)
         -11 (-16.92% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DataFlowPass:DeclareVariables(ImmutableArray`1):this
         -11 (-16.92% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMemberContainerTypeSymbol:CheckInterfaceMembers(ImmutableArray`1,DiagnosticBag)
         -11 (-16.18% of base) : Microsoft.CodeAnalysis.CSharp.dasm - AbstractRegionDataFlowPass:MakeSlots(ImmutableArray`1):this
         -11 (-16.18% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - AbstractRegionDataFlowPass:MakeSlots(ImmutableArray`1):this
         -19 (-15.97% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - LambdaSymbol:.ctor(VisualBasicSyntaxNode,ImmutableArray`1,TypeSymbol,Binder):this
         -55 (-15.76% of base) : Microsoft.CodeAnalysis.dasm - MetadataVisitor:Visit(ImmutableArray`1):this (5 methods)
         -11 (-15.49% of base) : Microsoft.CodeAnalysis.CSharp.dasm - DataFlowPass:ReportUnusedVariables(ImmutableArray`1):this
         -11 (-15.49% of base) : Microsoft.CodeAnalysis.dasm - AnalysisState:AddToEventsMap_NoLock(ImmutableArray`1):this
         -11 (-15.28% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - EmbeddedSymbolManager:ValidateMethod(MethodSymbol)
         -12 (-15.19% of base) : Microsoft.CodeAnalysis.CSharp.dasm - StackOptimizerPass1:DeclareLocals(ImmutableArray`1,int):this
         -12 (-15.19% of base) : Microsoft.CodeAnalysis.dasm - MetadataWriter:SerializeCustomModifiers(ImmutableArray`1,BlobBuilder):this
3258 total methods with Code Size differences (1072 improved, 2186 regressed), 205675 unchanged.

AndyAyersMS · 2019-12-28T00:58:50Z

Also IsSharedStaticHelper has CORINFO_HELP_BOX in there, which makes no sense to me -- we are looking for helper calls that are potentially hoistable.

Dotnet-GitSync-Bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Dec 11, 2019

adamsitnik added the tenet-performance Performance related issue label Dec 13, 2019

hnrqbaggio mentioned this issue Dec 27, 2019

Improve performance of iterating with ForEach over ImmutableArray #1183

Merged

jkotas closed this as completed in #1183 Dec 27, 2019

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Dec 28, 2019

Update fgOptWhileLoop to allow multiple statements

b889e4f

Motivated in part by dotnet#780.

eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020

AndyAyersMS mentioned this issue Jul 7, 2020

JIT: heuristics in optInvertWhileLoop may be overly conservative #6569

Closed

BruceForstall mentioned this issue Oct 17, 2020

Improve JIT loop optimizations (.NET 6) #43549

Closed

25 tasks

ghost locked as resolved and limited conversation to collaborators Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterating with ForEach over ImmutableArray is slower than over Array #780

Iterating with ForEach over ImmutableArray is slower than over Array #780

hnrqbaggio commented Dec 11, 2019

hnrqbaggio commented Dec 11, 2019 •

edited

Loading

adamsitnik commented Dec 13, 2019

hnrqbaggio commented Dec 13, 2019

hnrqbaggio commented Dec 20, 2019 •

edited

Loading

danmoseley commented Dec 21, 2019

AndyAyersMS commented Dec 26, 2019

AndyAyersMS commented Dec 26, 2019

AndyAyersMS commented Dec 26, 2019

AndyAyersMS commented Dec 26, 2019

danmoseley commented Dec 27, 2019

AndyAyersMS commented Dec 27, 2019

danmoseley commented Dec 27, 2019

hnrqbaggio commented Dec 27, 2019

AndyAyersMS commented Dec 27, 2019

AndyAyersMS commented Dec 28, 2019

AndyAyersMS commented Dec 28, 2019

Iterating with ForEach over ImmutableArray is slower than over Array #780

Iterating with ForEach over ImmutableArray is slower than over Array #780

Comments

hnrqbaggio commented Dec 11, 2019

Comparing with Array

Possible fix

Correctness

Baseline

Array

ImmutableArray

With Aggressive Inlining

Array

ImmutableArray

hnrqbaggio commented Dec 11, 2019 • edited Loading

adamsitnik commented Dec 13, 2019

hnrqbaggio commented Dec 13, 2019

About the String case

Just use this.array.GetEnumerator

hnrqbaggio commented Dec 20, 2019 • edited Loading

Why inlining GetEnumerator helps

Code size

Benchmark for String

danmoseley commented Dec 21, 2019

AndyAyersMS commented Dec 26, 2019

AndyAyersMS commented Dec 26, 2019

AndyAyersMS commented Dec 26, 2019

AndyAyersMS commented Dec 26, 2019

danmoseley commented Dec 27, 2019

AndyAyersMS commented Dec 27, 2019

danmoseley commented Dec 27, 2019

hnrqbaggio commented Dec 27, 2019

AndyAyersMS commented Dec 27, 2019

AndyAyersMS commented Dec 28, 2019

AndyAyersMS commented Dec 28, 2019

hnrqbaggio commented Dec 11, 2019 •

edited

Loading

hnrqbaggio commented Dec 20, 2019 •

edited

Loading