Eliminate duplicate zero initializations more aggressively. #31960

erozenfeld · 2020-02-08T01:40:44Z

We have 3 places in the code where we may need to insert explicit zero
initialization. In some cases the initialization may become redundant with prolog
initialization so we have logic to avoid explicit initialization in some cases.
This change makes the logic less conservative.

If the variable is not being initialized in a loop, we can avoid explicit zero initialization if
- the variable is a gc pointer or a struct with gc pointer fields, or
- compInitMem is set and the variable has a long lifetime.
Before this change we inserted explicit zero initalization in one more case: when compInitMem
wasn't set and the variable was a gc pointer or a struct with gc fields. That is not necessary,
so this change fixes that in fgVarNeedsExplicitZeroInit.
When we were allocating structs via newobj, we were always inserting explicit zero initializations
when importing an inlinee. This change applies the same optimization if the basic block can't be
in a loop after inlining.
When we were initializing inlinee locals, we were only optimizing explicit zero initializations
for structs; however, the same logic applies to non-structs.

erozenfeld · 2020-02-08T01:42:02Z

Framework diffs:

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -98251 (-0.21% of base)
    diff is an improvement.
Top file regressions (bytes):
           2 : System.ServiceProcess.ServiceController.dasm (0.01% of base)
Top file improvements (bytes):
      -22083 : System.Collections.Immutable.dasm (-1.99% of base)
       -9113 : Microsoft.CodeAnalysis.dasm (-0.52% of base)
       -7269 : System.Net.Http.dasm (-1.10% of base)
       -7097 : System.Private.Xml.dasm (-0.20% of base)
       -6953 : System.Collections.dasm (-1.24% of base)
       -5956 : System.Private.CoreLib.dasm (-0.13% of base)
       -5799 : Microsoft.CodeAnalysis.CSharp.dasm (-0.13% of base)
       -4808 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.09% of base)
       -3493 : System.Data.Common.dasm (-0.23% of base)
       -3003 : System.Reflection.Metadata.dasm (-0.70% of base)
       -1898 : Newtonsoft.Json.dasm (-0.22% of base)
       -1257 : System.Data.SqlClient.dasm (-0.14% of base)
       -1131 : System.Security.Cryptography.X509Certificates.dasm (-0.67% of base)
       -1093 : System.Net.HttpListener.dasm (-0.48% of base)
       -1066 : System.Memory.dasm (-0.43% of base)
       -1018 : System.Net.Mail.dasm (-0.48% of base)
       -1004 : System.Linq.dasm (-0.10% of base)
        -927 : System.Net.Security.dasm (-0.57% of base)
        -773 : System.Security.Cryptography.Pkcs.dasm (-0.18% of base)
        -750 : System.Text.Json.dasm (-0.17% of base)
93 total files with Code Size differences (92 improved, 1 regressed), 70 unchanged.
Top method regressions (bytes):
          86 (20.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMethodSymbol:LazyMethodChecks():this
          22 ( 1.02% of base) : Microsoft.VisualBasic.Core.dasm - ObjectType:SubObj(Object,Object):Object
          19 ( 1.76% of base) : System.Data.OleDb.dasm - OleDbConnectionString:LoadStringFromFileStorage(String):String
          14 ( 0.67% of base) : System.Security.Cryptography.Algorithms.dasm - PasswordBasedEncryption:Encrypt(ReadOnlySpan`1,ReadOnlySpan`1,SymmetricAlgorithm,bool,ReadOnlySpan`1,PbeParameters,ReadOnlySpan`1,ref,Span`1):int
          14 ( 0.67% of base) : System.Security.Cryptography.Cng.dasm - PasswordBasedEncryption:Encrypt(ReadOnlySpan`1,ReadOnlySpan`1,SymmetricAlgorithm,bool,ReadOnlySpan`1,PbeParameters,ReadOnlySpan`1,ref,Span`1):int
          14 ( 0.67% of base) : System.Security.Cryptography.Pkcs.dasm - PasswordBasedEncryption:Encrypt(ReadOnlySpan`1,ReadOnlySpan`1,SymmetricAlgorithm,bool,ReadOnlySpan`1,PbeParameters,ReadOnlySpan`1,ref,Span`1):int
          13 ( 2.29% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SynthesizedEventAccessorSymbol:get_Parameters():ImmutableArray`1:this
          10 ( 0.69% of base) : System.Security.Cryptography.Pkcs.dasm - PfxAsn:VerifyMac(ReadOnlySpan`1,ReadOnlySpan`1):bool:this
           7 ( 2.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - TypeBinder:LookupNullableType(NullableTypeSyntax,Binder,DiagnosticBag,bool):SingleLookupResult
           7 ( 1.72% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(ubyte,MutationInput):MutationResult
           7 ( 1.75% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(int,MutationInput):MutationResult
           7 ( 1.73% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(long,MutationInput):MutationResult
           7 ( 2.01% of base) : System.Net.Http.dasm - KnownHeader:.ctor(String,ubyte,HttpHeaderParser,ref,Nullable`1,Nullable`1):this
           7 ( 1.36% of base) : System.Private.CoreLib.dasm - StringInfo:ParseCombiningCharacters(String):ref
           6 ( 1.49% of base) : System.Linq.Expressions.dasm - LambdaCompiler:EmitLambdaBody(CompilerScope,bool,int):this
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(ubyte,MutationInput):MutationResult
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(short,MutationInput):MutationResult
           5 ( 1.19% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(int,MutationInput):MutationResult
           5 ( 1.21% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(double,MutationInput):MutationResult
           5 ( 0.94% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(Vector`1,MutationInput):MutationResult
Top method improvements (bytes):
       -3243 (-15.73% of base) : System.Net.Http.dasm - HeaderDescriptor:.cctor()
       -1466 (-4.25% of base) : System.Data.Common.dasm - SortExpressionBuilder`1:CloneCast():SortExpressionBuilder`1:this (49 methods)
       -1132 (-15.13% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeProjectedTypes()
       -1117 (-6.97% of base) : System.Collections.Immutable.dasm - Enumerator:MoveNext():bool:this (77 methods)
        -992 (-7.77% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:SymmetricExcept(IEnumerable`1,MutationInput):MutationResult (7 methods)
        -932 (-18.25% of base) : System.Collections.Immutable.dasm - Builder:GetEnumerator():Enumerator:this (35 methods)
        -858 (-2.05% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:ToDictionary(Func`2,IEqualityComparer`1):Dictionary`2:this (49 methods)
        -853 (-14.65% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (42 methods)
        -684 (-7.56% of base) : Microsoft.CodeAnalysis.dasm - PeWriter:WriteHeaders(Stream,NtHeader,CoffHeader,List`1,byref):this
        -643 (-17.23% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Generic.IEnumerable<T>.GetEnumerator():IEnumerator`1:this (28 methods)
        -569 (-20.73% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator():Enumerator:this (21 methods)
        -569 (-21.07% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator(Builder):Enumerator:this (21 methods)
        -560 (-15.44% of base) : Microsoft.CodeAnalysis.dasm - Enumerator:.ctor(ValueSet):this (7 methods)
        -546 (-19.77% of base) : System.Collections.Immutable.dasm - HashBucket:GetEnumerator():Enumerator:this (14 methods)
        -498 (-24.53% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:GetEnumerator():Enumerator:this (7 methods)
        -498 (-24.68% of base) : System.Collections.Immutable.dasm - NodeEnumerable:GetEnumerator():Enumerator:this (7 methods)
        -486 (-17.63% of base) : System.Collections.Immutable.dasm - Node:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (21 methods)
        -483 (-11.28% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:System.Collections.Generic.ICollection<T>.CopyTo(ref,int):this (7 methods)
        -483 (-9.40% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:System.Collections.ICollection.CopyTo(Array,int):this (7 methods)
        -468 (-24.55% of base) : System.Collections.dasm - Enumerator:.ctor(SortedDictionary`2):this (14 methods)
Top method regressions (percentages):
          86 (20.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMethodSymbol:LazyMethodChecks():this
           7 ( 2.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - TypeBinder:LookupNullableType(NullableTypeSyntax,Binder,DiagnosticBag,bool):SingleLookupResult
          13 ( 2.29% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SynthesizedEventAccessorSymbol:get_Parameters():ImmutableArray`1:this
           7 ( 2.01% of base) : System.Net.Http.dasm - KnownHeader:.ctor(String,ubyte,HttpHeaderParser,ref,Nullable`1,Nullable`1):this
           4 ( 1.80% of base) : System.Data.SqlClient.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Private.CoreLib.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.ServiceProcess.ServiceController.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Windows.Extensions.dasm - ValueStringBuilder:Insert(int,String):this
          19 ( 1.76% of base) : System.Data.OleDb.dasm - OleDbConnectionString:LoadStringFromFileStorage(String):String
           7 ( 1.75% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(int,MutationInput):MutationResult
           7 ( 1.73% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(long,MutationInput):MutationResult
           7 ( 1.72% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:Remove(ubyte,MutationInput):MutationResult
           6 ( 1.49% of base) : System.Linq.Expressions.dasm - LambdaCompiler:EmitLambdaBody(CompilerScope,bool,int):this
           7 ( 1.36% of base) : System.Private.CoreLib.dasm - StringInfo:ParseCombiningCharacters(String):ref
           5 ( 1.21% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(double,MutationInput):MutationResult
           5 ( 1.19% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(int,MutationInput):MutationResult
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(long,MutationInput):MutationResult
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(ubyte,MutationInput):MutationResult
           5 ( 1.18% of base) : System.Collections.Immutable.dasm - ImmutableDictionary`2:Remove(short,MutationInput):MutationResult
           5 ( 1.12% of base) : System.Net.Requests.dasm - FtpControlStream:GetPortV6(String):int:this
Top method improvements (percentages):
         -32 (-28.32% of base) : Microsoft.CodeAnalysis.dasm - ReversedEnumeratorImpl:.ctor(byref):this
         -84 (-26.84% of base) : Microsoft.CodeAnalysis.dasm - EnumeratorImpl:.ctor(byref):this (3 methods)
         -32 (-26.67% of base) : Microsoft.CodeAnalysis.dasm - SyntaxTriviaList:GetEnumerator():Enumerator:this
         -13 (-25.49% of base) : Microsoft.CodeAnalysis.CSharp.dasm - NamespaceSymbol:Microsoft.CodeAnalysis.INamespaceSymbol.get_NamespaceKind():int:this
         -13 (-25.49% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - NamespaceSymbol:get_INamespaceSymbol_NamespaceKind():int:this
         -20 (-25.32% of base) : System.Data.Common.dasm - Index:GetEnumerator(int):RBTreeEnumerator:this
        -224 (-25.31% of base) : System.Memory.dasm - ReadOnlySequence`1:GetEnumerator():Enumerator:this (7 methods)
        -498 (-24.68% of base) : System.Collections.Immutable.dasm - NodeEnumerable:GetEnumerator():Enumerator:this (7 methods)
        -468 (-24.55% of base) : System.Collections.dasm - Enumerator:.ctor(SortedDictionary`2):this (14 methods)
        -498 (-24.53% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:GetEnumerator():Enumerator:this (7 methods)
         -26 (-24.30% of base) : Microsoft.CodeAnalysis.dasm - SyntaxTokenList:GetEnumerator():Enumerator:this
        -260 (-23.92% of base) : System.Collections.dasm - SortedDictionary`2:GetEnumerator():Enumerator:this (7 methods)
         -19 (-23.46% of base) : Newtonsoft.Json.dasm - JsonSerializerSettings:set_Context(StreamingContext):this
         -20 (-22.99% of base) : System.Security.Cryptography.Algorithms.dasm - AsnValueReader:OpenUnchecked(ReadOnlySpan`1,int):AsnValueReader
         -20 (-22.99% of base) : System.Security.Cryptography.Cng.dasm - AsnValueReader:OpenUnchecked(ReadOnlySpan`1,int):AsnValueReader
         -20 (-22.99% of base) : System.Security.Cryptography.Pkcs.dasm - AsnValueReader:OpenUnchecked(ReadOnlySpan`1,int):AsnValueReader
         -20 (-22.99% of base) : System.Security.Cryptography.X509Certificates.dasm - AsnValueReader:OpenUnchecked(ReadOnlySpan`1,int):AsnValueReader
         -58 (-22.48% of base) : Microsoft.CodeAnalysis.dasm - Reversed:GetEnumerator():Enumerator:this (3 methods)
        -437 (-22.28% of base) : System.Collections.Immutable.dasm - NodeEnumerable:System.Collections.Generic.IEnumerable<T>.GetEnumerator():IEnumerator`1:this (7 methods)
        -437 (-22.28% of base) : System.Collections.Immutable.dasm - NodeEnumerable:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (7 methods)
2713 total methods with Code Size differences (2669 improved, 44 regressed), 241635 unchanged.

erozenfeld · 2020-02-08T01:46:06Z

Sample diff from Microsoft.CodeAnalysis.dll:

; Assembly listing for method ReversedEnumeratorImpl:.ctor(byref):this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] (  3,  3   )     ref  ->  rdi         this class-hnd
;  V01 arg1         [V01,T01] (  3,  3   )   byref  ->  rdx        
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
-; V03 tmp1         [V03    ] (  3,  6   )  struct (56) [rsp+0x20]   do-not-enreg[XSB] must-init addr-exposed "NewObj constructor temp"
+; V03 tmp1         [V03    ] (  2,  4   )  struct (56) [rsp+0x20]   do-not-enreg[XSB] must-init addr-exposed "NewObj constructor temp"
;
; Lcl frame size = 88

G_M63972_IG01:
       push     rdi
       push     rsi
       sub      rsp, 88
-      vzeroupper 
       mov      rsi, rcx
       lea      rdi, [rsp+20H]
       mov      ecx, 14
       xor      rax, rax
       rep stosd 
       mov      rcx, rsi
       mov      rdi, rcx
-					;; bbWeight=1    PerfScore 30.00
+					;; bbWeight=1    PerfScore 29.00
G_M63972_IG02:
-      xor      ecx, ecx
-      vxorps   xmm0, xmm0
-      vmovdqu  xmmword ptr [rsp+20H], xmm0
-      vmovdqu  xmmword ptr [rsp+30H], xmm0
-      vmovdqu  xmmword ptr [rsp+40H], xmm0
-      mov      qword ptr [rsp+50H], rcx
       lea      rcx, bword ptr [rsp+20H]
       call     Enumerator:.ctor(byref):this
       add      rdi, 8
       lea      rsi, bword ptr [rsp+20H]
       call     CORINFO_HELP_ASSIGN_BYREF
       call     CORINFO_HELP_ASSIGN_BYREF
       movsq    
       movsq    
       call     CORINFO_HELP_ASSIGN_BYREF
       call     CORINFO_HELP_ASSIGN_BYREF
       movsq    
-					;; bbWeight=1    PerfScore 13.83
+					;; bbWeight=1    PerfScore 9.25
G_M63972_IG03:
       add      rsp, 88
       pop      rsi
       pop      rdi
       ret      
						;; bbWeight=1    PerfScore 2.25

-; Total bytes of code 113, prolog size 29, PerfScore 57.78, (MethodHash=e622061c) for method ReversedEnumeratorImpl:.ctor(byref):this
+; Total bytes of code 81, prolog size 26, PerfScore 48.60, (MethodHash=e622061c) for method ReversedEnumeratorImpl:.ctor(byref):this

erozenfeld · 2020-02-08T01:47:16Z

@dotnet/jit-contrib @AndyAyersMS PTAL

erozenfeld · 2020-02-08T02:09:16Z

Benchmark diffs:

PMI CodeSize Diffs for benchstones and benchmarks game in F:\runtime\artifacts\tests\coreclr\Windows_NT.x64.Release for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -263 (-0.05% of base)
    diff is an improvement.
Top file improvements (bytes):
        -242 : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm (-5.98% of base)
         -21 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (-0.51% of base)
2 total files with Code Size differences (2 improved, 0 regressed), 80 unchanged.
Top method improvements (bytes):
         -88 (-8.14% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:compose_r(int,int,int,int):this
         -88 (-8.26% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:compose_l(int,int,int,int):this
         -43 (-11.35% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:Run(bool):this
         -23 (-4.54% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:extract(int):int:this
         -21 (-3.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - KNucleotide:WriteFrequencies(int,ref,bool):bool:this
Top method improvements (percentages):
         -43 (-11.35% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:Run(bool):this
         -88 (-8.26% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:compose_l(int,int,int,int):this
         -88 (-8.14% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:compose_r(int,int,int,int):this
         -23 (-4.54% of base) : BenchmarksGame\pidigits\pidigits-3\pidigits-3.dasm - pidigits:extract(int):int:this
         -21 (-3.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - KNucleotide:WriteFrequencies(int,ref,bool):bool:this
5 total methods with Code Size differences (5 improved, 0 regressed), 1888 unchanged.

AndyAyersMS

Nice improvement!

AndyAyersMS · 2020-02-08T02:10:21Z

src/coreclr/src/jit/flowgraph.cpp

+
+                    newStmt = gtNewStmt(tree, callILOffset);
+                    fgInsertStmtAfter(block, afterStmt, newStmt);
+                    afterStmt = newStmt;


Can you add jitdump here for the case where the local is used by we decide we don't need to zero init?

Sure, done.

AndyAyersMS · 2020-02-08T02:14:46Z

src/coreclr/src/jit/flowgraph.cpp

@@ -23579,10 +23579,14 @@ Statement* Compiler::fgInlinePrependStatements(InlineInfo* inlineInfo)

    CORINFO_METHOD_INFO* InlineeMethodInfo = InlineeCompiler->info.compMethodInfo;

-    unsigned lclCnt = InlineeMethodInfo->locals.numArgs;
+    unsigned lclCnt    = InlineeMethodInfo->locals.numArgs;
+    bool     bbInALoop = (block->bbFlags & BBF_BACKWARD_JUMP) != 0;


We set BBF_BACKWARD_JUMP very early, just based on lexical IL. So it may encompass non-loop blocks (eg loop with a branch that leads to a return).

We may also optimize away backwards branches (probably rare, but not impossible).

I wonder how often we end up with extra zeroing from BBF_BACKWARD_JUMP being approximate? Any Idea?

No, I don't know. I guess I can try to allow this optimization for BBJ_RETURN basic blocks in root methods since we can execute those blocks only once even in a loop. That won't account for all cases you are asking about but I don't see an easy way to capture those.

To estimate how often this happens, you could perhaps set a new BBF in blocks where we add zero inits. Then later when we run some more robust form of loop detection, see how many non-loop blocks have this BBF.

sandreenko · 2020-02-09T21:21:18Z

src/coreclr/src/jit/flowgraph.cpp

            if (tmpNum != BAD_VAR_NUM)
            {
+                if (!fgVarNeedsExplicitZeroInit(&lvaTable[tmpNum], bbInALoop))


Nit: please use lvaGetDesc(varNum) instead of &lvaTable[tmpNum] here and below.

We have 3 places in the code where we may need to insert explicit zero initialization. In some cases the initialization may become redundant with prolog initialization so we have logic to avoid explicit initialization in some cases. This change makes the logic less conservative. 1. If the variable is not being initialized in a loop, we can avoid explicit zero initialization if - the variable is a gc pointer, or - the variable is a struct with gc pointer fields and either all fields are gc pointer fields or the struct is big enough to guarantee block initialization, or - compInitMem is set and the variable has a long lifetime or has gc fields. Before this change we always inserted explicit zero initalization when compInitMem wasn't set and the variable was a gc pointer or a struct with gc fields. This relaxes that as described above. 2. When we were allocating structs via newobj, we were always inserting explicit zero initializations when importing an inlinee. This change applies the same optimization if the basic block can't be in a loop after inlining. 3. When we were initializing inlinee locals, we were only optimizing explicit zero initializations for structs; however, the same logic applies to non-structs. 4. If the initialization basic block is in a loop but is BBJ_RETURN, we may avoid zero initialization since the block will be executed at most once per method invocation.

erozenfeld · 2020-02-11T03:15:26Z

It turns out that the initial PR was too aggressive. My assumption that we always fully zero initialize structs with GC fields in the prolog was incorrect. The prolog zero initialization logic lives in CodeGen::genCheckUseBlockInit and CodeGen::genZeroInitFrame. If CodeGen::genCheckUseBlockInit decides that we shouldn't block init locals and compiler->info.compInitMem is false, CodeGen::genZeroInitFrame will only zero initialize gc fields in the prolog:

runtime/src/coreclr/src/jit/codegencommon.cpp

Lines 6403 to 6418 in 8079a27

    
           if ((varDsc->TypeGet() == TYP_STRUCT) && !compiler->info.compInitMem && 
        
               (varDsc->lvExactSize >= TARGET_POINTER_SIZE)) 
        
           { 
        
               // We only initialize the GC variables in the TYP_STRUCT 
        
               const unsigned slots  = (unsigned)compiler->lvaLclSize(varNum) / REGSIZE_BYTES; 
        
               ClassLayout*   layout = varDsc->GetLayout(); 
        
               for (unsigned i = 0; i < slots; i++) 
        
               { 
        
                   if (layout->IsGCPtr(i)) 
        
                   { 
        
                       GetEmitter()->emitIns_S_R(ins_Store(TYP_I_IMPL), EA_PTRSIZE, 
        
                                                 genGetZeroReg(initReg, pInitRegZeroed), varNum, i * REGSIZE_BYTES); 
        
                   } 
        
               } 
        
           }

So I modified my changes to skip zero initialization in structs with GC fields when info.compInitMem is false only when all fields are gc fields or when we can guarantee block initialization.

While studying CodeGen::genCheckUseBlockInit I discovered that a bunch of code there is currently unreachable and has been unreachable since at least 2014. So I made it clear by placing it under #if 0. The unreachable code was potentially affecting both correctness and block init heuristics, so this will need to be investigated but can be done outside of this PR, since it's pre-existing.

I also made the change to not block the optimization for BBJ_RETURN basic blocks inside loops.

This leaves us with about half of the original wins, which is still worthwhile.

erozenfeld · 2020-02-11T03:16:18Z

New framework diff results:

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -54112 (-0.11% of base)
    diff is an improvement.
Top file regressions (bytes):
           4 : System.ServiceProcess.ServiceController.dasm (0.01% of base)
           4 : System.Windows.Extensions.dasm (0.01% of base)
Top file improvements (bytes):
      -18143 : System.Collections.Immutable.dasm (-1.63% of base)
       -7219 : Microsoft.CodeAnalysis.dasm (-0.41% of base)
       -4600 : System.Net.Http.dasm (-0.70% of base)
       -3721 : Microsoft.CodeAnalysis.CSharp.dasm (-0.09% of base)
       -3386 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.06% of base)
       -2777 : System.Private.CoreLib.dasm (-0.06% of base)
       -2746 : System.Collections.dasm (-0.49% of base)
        -969 : System.Data.SqlClient.dasm (-0.11% of base)
        -834 : System.Linq.dasm (-0.08% of base)
        -644 : System.Memory.dasm (-0.26% of base)
        -589 : System.Threading.Channels.dasm (-0.33% of base)
        -493 : System.Text.Json.dasm (-0.11% of base)
        -482 : System.Net.Security.dasm (-0.30% of base)
        -452 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.01% of base)
        -419 : System.Private.Xml.dasm (-0.01% of base)
        -410 : System.Reflection.Metadata.dasm (-0.10% of base)
        -398 : System.Net.HttpListener.dasm (-0.17% of base)
        -398 : System.Net.Mail.dasm (-0.19% of base)
        -384 : System.Threading.Tasks.Dataflow.dasm (-0.05% of base)
        -379 : System.Linq.Parallel.dasm (-0.02% of base)
73 total files with Code Size differences (71 improved, 2 regressed), 90 unchanged.
Top method regressions (bytes):
          86 (20.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMethodSymbol:LazyMethodChecks():this
          22 ( 1.02% of base) : Microsoft.VisualBasic.Core.dasm - ObjectType:SubObj(Object,Object):Object
          19 ( 1.76% of base) : System.Data.OleDb.dasm - OleDbConnectionString:LoadStringFromFileStorage(String):String
          13 ( 2.29% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SynthesizedEventAccessorSymbol:get_Parameters():ImmutableArray`1:this
          10 ( 0.69% of base) : System.Security.Cryptography.Pkcs.dasm - PfxAsn:VerifyMac(ReadOnlySpan`1,ReadOnlySpan`1):bool:this
           7 ( 2.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - TypeBinder:LookupNullableType(NullableTypeSyntax,Binder,DiagnosticBag,bool):SingleLookupResult
           7 ( 1.36% of base) : System.Private.CoreLib.dasm - StringInfo:ParseCombiningCharacters(String):ref
           4 ( 1.80% of base) : System.Data.SqlClient.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Private.CoreLib.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.ServiceProcess.ServiceController.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Windows.Extensions.dasm - ValueStringBuilder:Insert(int,String):this
           3 ( 0.02% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(VisualBasicSyntaxNode,Location,int,byref,ImmutableArray`1,ImmutableArray`1,bool,bool,bool,bool,DiagnosticBag,Symbol,bool,VisualBasicSyntaxNode,Symbol):this
Top method improvements (bytes):
       -3243 (-15.73% of base) : System.Net.Http.dasm - HeaderDescriptor:.cctor()
       -1117 (-6.97% of base) : System.Collections.Immutable.dasm - Enumerator:MoveNext():bool:this (77 methods)
        -932 (-18.25% of base) : System.Collections.Immutable.dasm - Builder:GetEnumerator():Enumerator:this (35 methods)
        -858 (-2.05% of base) : Microsoft.CodeAnalysis.dasm - ArrayBuilder`1:ToDictionary(Func`2,IEqualityComparer`1):Dictionary`2:this (49 methods)
        -853 (-14.65% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (42 methods)
        -729 (-5.71% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:SymmetricExcept(IEnumerable`1,MutationInput):MutationResult (7 methods)
        -684 (-7.56% of base) : Microsoft.CodeAnalysis.dasm - PeWriter:WriteHeaders(Stream,NtHeader,CoffHeader,List`1,byref):this
        -643 (-17.23% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Generic.IEnumerable<T>.GetEnumerator():IEnumerator`1:this (28 methods)
        -639 (-50.00% of base) : System.Private.CoreLib.dasm - StreamReader:DetectEncoding():this (2 methods)
        -569 (-20.73% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator():Enumerator:this (21 methods)
        -569 (-21.07% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator(Builder):Enumerator:this (21 methods)
        -546 (-19.77% of base) : System.Collections.Immutable.dasm - HashBucket:GetEnumerator():Enumerator:this (14 methods)
        -486 (-17.63% of base) : System.Collections.Immutable.dasm - Node:System.Collections.IEnumerable.GetEnumerator():IEnumerator:this (21 methods)
        -448 (-12.36% of base) : Microsoft.CodeAnalysis.dasm - Enumerator:.ctor(ValueSet):this (7 methods)
        -420 (-4.47% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(ReadOnlySequence`1):this (7 methods)
        -402 (-19.80% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:GetEnumerator():Enumerator:this (7 methods)
        -402 (-19.92% of base) : System.Collections.Immutable.dasm - NodeEnumerable:GetEnumerator():Enumerator:this (7 methods)
        -363 (-8.48% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:System.Collections.Generic.ICollection<T>.CopyTo(ref,int):this (7 methods)
        -363 (-7.07% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:System.Collections.ICollection.CopyTo(Array,int):this (7 methods)
        -355 (-9.65% of base) : System.Collections.Immutable.dasm - Builder:System.Collections.Generic.ICollection<T>.CopyTo(ref,int):this (14 methods)
Top method regressions (percentages):
          86 (20.00% of base) : Microsoft.CodeAnalysis.CSharp.dasm - SourceMethodSymbol:LazyMethodChecks():this
           7 ( 2.42% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - TypeBinder:LookupNullableType(NullableTypeSyntax,Binder,DiagnosticBag,bool):SingleLookupResult
          13 ( 2.29% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SynthesizedEventAccessorSymbol:get_Parameters():ImmutableArray`1:this
           4 ( 1.80% of base) : System.Data.SqlClient.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Private.CoreLib.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.ServiceProcess.ServiceController.dasm - ValueStringBuilder:Insert(int,String):this
           4 ( 1.80% of base) : System.Windows.Extensions.dasm - ValueStringBuilder:Insert(int,String):this
          19 ( 1.76% of base) : System.Data.OleDb.dasm - OleDbConnectionString:LoadStringFromFileStorage(String):String
           7 ( 1.36% of base) : System.Private.CoreLib.dasm - StringInfo:ParseCombiningCharacters(String):ref
          22 ( 1.02% of base) : Microsoft.VisualBasic.Core.dasm - ObjectType:SubObj(Object,Object):Object
          10 ( 0.69% of base) : System.Security.Cryptography.Pkcs.dasm - PfxAsn:VerifyMac(ReadOnlySpan`1,ReadOnlySpan`1):bool:this
           3 ( 0.02% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:ReportOverloadResolutionFailureForASingleCandidate(VisualBasicSyntaxNode,Location,int,byref,ImmutableArray`1,ImmutableArray`1,bool,bool,bool,bool,DiagnosticBag,Symbol,bool,VisualBasicSyntaxNode,Symbol):this
Top method improvements (percentages):
        -639 (-50.00% of base) : System.Private.CoreLib.dasm - StreamReader:DetectEncoding():this (2 methods)
         -32 (-28.32% of base) : Microsoft.CodeAnalysis.dasm - ReversedEnumeratorImpl:.ctor(byref):this
         -84 (-26.84% of base) : Microsoft.CodeAnalysis.dasm - EnumeratorImpl:.ctor(byref):this (3 methods)
         -32 (-26.67% of base) : Microsoft.CodeAnalysis.dasm - SyntaxTriviaList:GetEnumerator():Enumerator:this
        -224 (-25.31% of base) : System.Memory.dasm - ReadOnlySequence`1:GetEnumerator():Enumerator:this (7 methods)
         -26 (-24.30% of base) : Microsoft.CodeAnalysis.dasm - SyntaxTokenList:GetEnumerator():Enumerator:this
         -58 (-22.48% of base) : Microsoft.CodeAnalysis.dasm - Reversed:GetEnumerator():Enumerator:this (3 methods)
        -156 (-22.07% of base) : System.Collections.Immutable.dasm - ImmutableSortedDictionary`2:GetEnumerator():Enumerator:this (7 methods)
        -569 (-21.07% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator(Builder):Enumerator:this (21 methods)
         -23 (-20.91% of base) : System.ComponentModel.Composition.dasm - CompositionResult:ToResult(Vector`1):CompositionResult`1:this
        -569 (-20.73% of base) : System.Collections.Immutable.dasm - Node:GetEnumerator():Enumerator:this (21 methods)
        -150 (-20.52% of base) : System.Collections.Immutable.dasm - ImmutableSortedSet`1:GetEnumerator():Enumerator:this (7 methods)
         -26 (-20.47% of base) : Microsoft.CodeAnalysis.dasm - UsedNamespaceOrType:CreateType(ITypeReference,String):UsedNamespaceOrType
         -26 (-20.47% of base) : Microsoft.CodeAnalysis.dasm - UsedNamespaceOrType:CreateNamespace(INamespace,IAssemblyReference,String):UsedNamespaceOrType
         -26 (-20.47% of base) : Microsoft.CodeAnalysis.dasm - UsedNamespaceOrType:CreateExternAlias(String):UsedNamespaceOrType
         -26 (-20.47% of base) : Microsoft.CodeAnalysis.dasm - UsedNamespaceOrType:CreateXmlNamespace(String,String):UsedNamespaceOrType
        -212 (-20.23% of base) : System.Collections.Immutable.dasm - ImmutableList`1:GetEnumerator():Enumerator:this (7 methods)
        -402 (-19.92% of base) : System.Collections.Immutable.dasm - NodeEnumerable:GetEnumerator():Enumerator:this (7 methods)
         -74 (-19.84% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BinaryOperatorAnalysisResult:Worse():BinaryOperatorAnalysisResult:this
        -402 (-19.80% of base) : System.Collections.Immutable.dasm - ImmutableHashSet`1:GetEnumerator():Enumerator:this (7 methods)
1230 total methods with Code Size differences (1218 improved, 12 regressed), 243118 unchanged.

erozenfeld · 2020-02-11T03:18:38Z

Benchmark diffs:

PMI CodeSize Diffs for benchstones and benchmarks game in F:\runtime\artifacts\tests\coreclr\Windows_NT.x64.Release for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -21 (-0.00% of base)
    diff is an improvement.
Top file improvements (bytes):
         -21 : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm (-0.51% of base)
1 total files with Code Size differences (1 improved, 0 regressed), 81 unchanged.
Top method improvements (bytes):
         -21 (-3.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - KNucleotide:WriteFrequencies(int,ref,bool):bool:this
Top method improvements (percentages):
         -21 (-3.43% of base) : BenchmarksGame\k-nucleotide\k-nucleotide-1\k-nucleotide-1.dasm - KNucleotide:WriteFrequencies(int,ref,bool):bool:this
1 total methods with Code Size differences (1 improved, 0 regressed), 1892 unchanged.

erozenfeld · 2020-02-11T03:19:00Z

@AndyAyersMS @sandreenko PTAL

CarolEidt · 2020-02-11T21:26:03Z

src/coreclr/src/jit/codegencommon.cpp

@@ -4656,6 +4656,9 @@ void CodeGen::genCheckUseBlockInit()
            continue;
        }

+// TODO-Review: The code below is currently unreachable. We are guaranteed to execute one of the
+// 'continue' statements above.
+#if 0


I don't want to hold up getting this merged - looks great BTW - but why didn't you just delete this?

I want to go back and see if any of this needs to be resurrected. It looks like part of the unreachable code was supposed to be responsible for correctness. I'll clean this up in a subsequent PR.

Perhaps open an issue for follow up on this, so we don't forget?

I added a comment to the issue that tracks improving the heuristic: #8890 (comment)

AndyAyersMS

Thanks, looks good.

Seems like we should probably block init more aggressively in the prolog (as well as init more efficiently).

erozenfeld · 2020-02-11T23:39:52Z

Yes, we have #8890 open [RyuJIT] Improve heuristic for zero-initialization of locals

erozenfeld requested a review from AndyAyersMS February 8, 2020 01:47

AndyAyersMS approved these changes Feb 8, 2020

View reviewed changes

AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 8, 2020

erozenfeld force-pushed the RemoveMoreDoubleZeroInit branch from 0a80c31 to b9a2f2b Compare February 8, 2020 02:56

sandreenko reviewed Feb 9, 2020

View reviewed changes

erozenfeld force-pushed the RemoveMoreDoubleZeroInit branch from b9a2f2b to 66c810e Compare February 11, 2020 02:59

CarolEidt reviewed Feb 11, 2020

View reviewed changes

AndyAyersMS approved these changes Feb 11, 2020

View reviewed changes

erozenfeld merged commit f9dfe45 into dotnet:master Feb 11, 2020

erozenfeld mentioned this pull request May 14, 2020

[RyuJIT] Avoid redundant zero-initialization of temp structs with GC fields #8886

Closed

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate duplicate zero initializations more aggressively. #31960

Eliminate duplicate zero initializations more aggressively. #31960

erozenfeld commented Feb 8, 2020

erozenfeld commented Feb 8, 2020 •

edited

Loading

erozenfeld commented Feb 8, 2020 •

edited

Loading

erozenfeld commented Feb 8, 2020

erozenfeld commented Feb 8, 2020

AndyAyersMS left a comment

AndyAyersMS Feb 8, 2020

erozenfeld Feb 8, 2020

AndyAyersMS Feb 8, 2020

erozenfeld Feb 8, 2020

AndyAyersMS Feb 8, 2020

sandreenko Feb 9, 2020

erozenfeld Feb 11, 2020

erozenfeld commented Feb 11, 2020

erozenfeld commented Feb 11, 2020

erozenfeld commented Feb 11, 2020

erozenfeld commented Feb 11, 2020

CarolEidt Feb 11, 2020

erozenfeld Feb 11, 2020

AndyAyersMS Apr 16, 2020

erozenfeld Apr 22, 2020

AndyAyersMS left a comment

erozenfeld commented Feb 11, 2020

Eliminate duplicate zero initializations more aggressively. #31960

Eliminate duplicate zero initializations more aggressively. #31960

Conversation

erozenfeld commented Feb 8, 2020

erozenfeld commented Feb 8, 2020 • edited Loading

erozenfeld commented Feb 8, 2020 • edited Loading

erozenfeld commented Feb 8, 2020

erozenfeld commented Feb 8, 2020

AndyAyersMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erozenfeld commented Feb 11, 2020

erozenfeld commented Feb 11, 2020

erozenfeld commented Feb 11, 2020

erozenfeld commented Feb 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS left a comment

Choose a reason for hiding this comment

erozenfeld commented Feb 11, 2020

erozenfeld commented Feb 8, 2020 •

edited

Loading

erozenfeld commented Feb 8, 2020 •

edited

Loading