Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64: Eliminate redundant opposite mov #38179

Merged
merged 13 commits into from
Jun 30, 2020

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Jun 20, 2020

Perform peephole optimization to skip mov instruction if the previous instruction was the opposite mov.

Fixes: #35252

Looking at the size improvement, it matches closely to what was originally estimated in #35252.

Crossgen CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -140040 (-0.281% of base)
    diff is an improvement.
Top file improvements (bytes):
      -19140 : System.Linq.Expressions.dasm (-0.418% of base)
      -13420 : System.Private.Xml.dasm (-0.292% of base)
      -12204 : System.Private.CoreLib.dasm (-0.214% of base)
       -7676 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.239% of base)
       -7436 : System.Data.Common.dasm (-0.451% of base)
       -6944 : Microsoft.CodeAnalysis.CSharp.dasm (-0.233% of base)
       -3620 : System.Linq.Parallel.dasm (-0.422% of base)
       -3556 : System.Management.dasm (-0.712% of base)
       -3552 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.069% of base)
       -3268 : System.DirectoryServices.dasm (-0.560% of base)
       -2804 : System.Private.DataContractSerialization.dasm (-0.253% of base)
       -2704 : Microsoft.CodeAnalysis.dasm (-0.257% of base)
       -2432 : Newtonsoft.Json.dasm (-0.268% of base)
       -2340 : System.DirectoryServices.AccountManagement.dasm (-0.624% of base)
       -2032 : System.Data.OleDb.dasm (-0.502% of base)
       -1672 : Microsoft.CSharp.dasm (-0.340% of base)
       -1672 : xunit.execution.dotnet.dasm (-0.601% of base)
       -1584 : System.Data.Odbc.dasm (-0.573% of base)
       -1568 : System.Text.Json.dasm (-0.238% of base)
       -1548 : System.Configuration.ConfigurationManager.dasm (-0.330% of base)
179 total files with Code Size differences (179 improved, 0 regressed), 84 unchanged.
Top method improvements (bytes):
      -17288 (-5.860% of base) : System.Linq.Expressions.dasm - System.Linq.Expressions.Interpreter.CallInstruction:FastCreate(System.Reflection.MethodInfo,System.Reflection.ParameterInfo[]):System.Linq.Expressions.Interpreter.CallInstruction (241 methods)
        -428 (-1.599% of base) : System.Private.Xml.dasm - System.Xml.Schema.XsdBuilder:.cctor()
        -424 (-2.719% of base) : System.DirectoryServices.AccountManagement.dasm - System.DirectoryServices.AccountManagement.ADStoreCtx:.cctor()
        -348 (-2.282% of base) : System.DirectoryServices.AccountManagement.dasm - System.DirectoryServices.AccountManagement.SAMStoreCtx:.cctor()
        -288 (-1.953% of base) : System.DirectoryServices.AccountManagement.dasm - System.DirectoryServices.AccountManagement.ADAMStoreCtx:.cctor()
        -228 (-1.256% of base) : System.Management.dasm - System.Management.ManagementClassGenerator:GenerateTypeConverterClass():System.CodeDom.CodeTypeDeclaration:this
        -200 (-4.230% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.Cci.MetadataWriter:.ctor(Microsoft.Cci.MetadataHeapsBuilder,Microsoft.Cci.MetadataHeapsBuilder,Microsoft.CodeAnalysis.Emit.EmitContext,Microsoft.CodeAnalysis.CommonMessageProvider,bool,bool,System.Threading.CancellationToken):this
        -184 (-2.028% of base) : System.Private.Xml.dasm - System.Xml.Schema.XdrBuilder:.cctor()
        -180 (-4.712% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.Ecma335.MetadataBuilder:.ctor(int,int,int,int):this
        -176 (-1.386% of base) : System.Management.dasm - System.Management.ManagementClassGenerator:GenerateProperties():this
        -164 (-6.337% of base) : System.Data.Common.dasm - System.Data.Common.DataStorage:CreateStorage(System.Data.DataColumn,System.Type,int):System.Data.Common.DataStorage
        -144 (-1.139% of base) : System.Management.dasm - System.Management.ManagementClassGenerator:AddToDMTFDateTimeFunction():this
        -136 (-1.216% of base) : System.Management.dasm - System.Management.ManagementClassGenerator:AddToDMTFTimeIntervalFunction():this
        -132 (-0.785% of base) : System.Management.dasm - System.Management.ManagementClassGenerator:AddToDateTimeFunction():this
        -124 (-1.393% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.TraceLoadedDotNetRuntime:SetupCallbacks(Microsoft.Diagnostics.Tracing.TraceEventDispatcher)
        -124 (-2.093% of base) : System.Management.dasm - System.Management.ManagementClassGenerator:GenerateIfClassvalidFunction():this
        -120 (-3.901% of base) : System.Private.Xml.dasm - MS.Internal.Xml.XPath.LogicalExpr:.cctor()
        -112 (-1.611% of base) : System.Management.dasm - System.Management.ManagementClassGenerator:GenerateCollectionClass():this
        -108 (-3.151% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.ExpressionEvaluator:EvaluateUnaryExpression(Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.UnaryExpressionSyntax):Microsoft.CodeAnalysis.VisualBasic.Syntax.InternalSyntax.CConst:this
        -104 (-1.261% of base) : Microsoft.CSharp.dasm - Microsoft.CSharp.RuntimeBinder.Semantics.ExpressionBinder:.cctor()
Top method improvements (percentages):
          -4 (-12.500% of base) : System.Reflection.MetadataLoadContext.dasm - System.Reflection.TypeLoading.Helpers:ConvertAssemblyFlagsToAssemblyNameFlags(int):int
          -4 (-11.111% of base) : Microsoft.VisualBasic.Core.dasm - Microsoft.VisualBasic.FileIO.FileSystem:GetOperationFlags(int):ushort
          -4 (-11.111% of base) : System.Private.CoreLib.dasm - System.Threading.Tasks.ConcurrentExclusiveSchedulerPair:GetCreationOptionsForTask(bool):int
          -4 (-11.111% of base) : System.Threading.Tasks.Dataflow.dasm - System.Threading.Tasks.Dataflow.Internal.Common:GetCreationOptionsForTask(bool):int
          -4 (-8.333% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Parsers.KernelTraceEventParser:DefaultOptionsForSource(Microsoft.Diagnostics.Tracing.TraceEventSource):int
          -4 (-8.333% of base) : System.ComponentModel.Annotations.dasm - System.ComponentModel.DataAnnotations.ValidationResult:ToString():System.String:this
          -4 (-8.333% of base) : System.Private.CoreLib.dasm - System.Globalization.DateTimeFormatInfo:InternalGetAbbreviatedDayOfWeekNames():System.String[]:this
          -4 (-8.333% of base) : System.Private.CoreLib.dasm - System.Globalization.DateTimeFormatInfo:InternalGetSuperShortDayNames():System.String[]:this
          -4 (-8.333% of base) : System.Private.CoreLib.dasm - System.Globalization.DateTimeFormatInfo:InternalGetDayOfWeekNames():System.String[]:this
          -4 (-8.333% of base) : System.Private.CoreLib.dasm - System.Globalization.DateTimeFormatInfo:InternalGetAbbreviatedMonthNames():System.String[]:this
          -4 (-8.333% of base) : System.Private.CoreLib.dasm - System.Globalization.DateTimeFormatInfo:InternalGetMonthNames():System.String[]:this
          -4 (-8.333% of base) : System.Private.CoreLib.dasm - System.Runtime.CompilerServices.AsyncTaskMethodBuilder:get_Task():System.Threading.Tasks.Task:this
          -4 (-8.333% of base) : System.Private.CoreLib.dasm - System.Reflection.RtFieldInfo:get_FieldType():System.Type:this
          -4 (-8.333% of base) : System.Private.CoreLib.dasm - System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Boolean][System.Boolean]:get_Task():System.Threading.Tasks.Task`1[Boolean]:this
          -4 (-8.333% of base) : System.Private.CoreLib.dasm - System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[Int32][System.Int32]:get_Task():System.Threading.Tasks.Task`1[Int32]:this
          -8 (-7.692% of base) : System.ComponentModel.TypeConverter.dasm - System.ComponentModel.PropertyDescriptor:GetInvocationTarget(System.Type,System.Object):System.Object:this
          -4 (-7.692% of base) : System.Configuration.ConfigurationManager.dasm - System.Configuration.OverrideModeSetting:get_AllowOverride():bool:this
          -4 (-7.692% of base) : System.Management.dasm - System.Management.ManagementClassGenerator:IsPropertyValueType(int):bool
          -4 (-7.692% of base) : System.Private.CoreLib.dasm - CachedData:get_Local():System.TimeZoneInfo:this
          -4 (-7.143% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.CodeAnalysis.SyntaxNode:get_SlotCount():int:this
20688 total methods with Code Size differences (20688 improved, 0 regressed), 200441 unchanged.

Update: JIT code size diff collected using PMI.

PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -113636 (-0.20% of base)
    diff is an improvement.
Top file improvements (bytes):
      -13728 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.22% of base)
      -11424 : System.Private.CoreLib.dasm (-0.19% of base)
       -9136 : System.Private.Xml.dasm (-0.22% of base)
       -8280 : Microsoft.CodeAnalysis.CSharp.dasm (-0.17% of base)
       -6232 : System.Linq.Parallel.dasm (-0.35% of base)
       -6056 : Microsoft.CodeAnalysis.dasm (-0.29% of base)
       -4464 : System.Collections.Immutable.dasm (-0.36% of base)
       -4380 : System.Data.Common.dasm (-0.25% of base)
       -2636 : System.DirectoryServices.dasm (-0.54% of base)
       -2352 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.07% of base)
       -2244 : Newtonsoft.Json.dasm (-0.23% of base)
       -1932 : System.Linq.Expressions.dasm (-0.21% of base)
       -1892 : System.Private.DataContractSerialization.dasm (-0.21% of base)
       -1604 : System.Linq.dasm (-0.14% of base)
       -1508 : System.Threading.Tasks.Dataflow.dasm (-0.14% of base)
       -1428 : System.DirectoryServices.AccountManagement.dasm (-0.34% of base)
       -1328 : xunit.execution.dotnet.dasm (-0.49% of base)
       -1200 : System.ComponentModel.Composition.dasm (-0.30% of base)
       -1132 : Microsoft.CSharp.dasm (-0.26% of base)
       -1132 : System.Management.dasm (-0.28% of base)
173 total files with Code Size differences (173 improved, 0 regressed), 90 unchanged.
Top method regressions (bytes):
           8 ( 7.14% of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[Byte][System.Byte]:.cctor()
           8 ( 6.67% of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[Int16][System.Int16]:.cctor()
Top method improvements (bytes):
       -1004 (-6.21% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:_Lambda$__0-0(Roslyn.Utilities.ObjectReader):System.Object:this (251 methods)
        -564 (-3.80% of base) : System.Linq.Expressions.dasm - System.Linq.Expressions.Interpreter.CallInstruction:FastCreate(System.Reflection.MethodInfo,System.Reflection.ParameterInfo[]):System.Linq.Expressions.Interpreter.CallInstruction (15 methods)
        -296 (-6.25% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:_Lambda$__9-0(Roslyn.Utilities.ObjectReader):System.Object:this (74 methods)
        -272 (-6.25% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:_Lambda$__10-0(Roslyn.Utilities.ObjectReader):System.Object:this (68 methods)
        -176 (-6.25% of base) : Microsoft.CodeAnalysis.CSharp.dasm - <>c:<GetReader>b__18_0(Roslyn.Utilities.ObjectReader):System.Object:this (44 methods)
        -168 (-6.25% of base) : Microsoft.CodeAnalysis.CSharp.dasm - <>c:<GetReader>b__21_0(Roslyn.Utilities.ObjectReader):System.Object:this (42 methods)
        -164 (-8.42% of base) : System.Data.Common.dasm - System.Data.Common.DataStorage:CreateStorage(System.Data.DataColumn,System.Type,int):System.Data.Common.DataStorage
        -156 (-6.25% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:_Lambda$__8-0(Roslyn.Utilities.ObjectReader):System.Object:this (39 methods)
        -140 (-6.25% of base) : Microsoft.CodeAnalysis.CSharp.dasm - <>c:<GetReader>b__24_0(Roslyn.Utilities.ObjectReader):System.Object:this (35 methods)
        -124 (-2.10% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Analysis.TraceLoadedDotNetRuntime:SetupCallbacks(Microsoft.Diagnostics.Tracing.TraceEventDispatcher)
        -100 (-7.69% of base) : System.Reflection.Metadata.dasm - Enumerator:System.Collections.IEnumerator.Reset():this (25 methods)
         -96 (-5.80% of base) : Microsoft.CodeAnalysis.dasm - Microsoft.Cci.FullMetadataWriter:.ctor(Microsoft.CodeAnalysis.Emit.EmitContext,Microsoft.Cci.MetadataHeapsBuilder,Microsoft.Cci.MetadataHeapsBuilder,Microsoft.CodeAnalysis.CommonMessageProvider,bool,bool,System.Threading.CancellationToken):this
         -96 (-6.25% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:_Lambda$__11-0(Roslyn.Utilities.ObjectReader):System.Object:this (24 methods)
         -96 (-2.72% of base) : System.Text.Json.dasm - System.Text.Json.JsonSerializerOptions:GetDefaultSimpleConverters():System.Collections.Generic.Dictionary`2[[System.Type, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.Text.Json.Serialization.JsonConverter, System.Text.Json, Version=5.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51]]
         -88 (-1.65% of base) : System.Private.CoreLib.dasm - System.Diagnostics.Tracing.RuntimeEventSource:OnEventCommand(System.Diagnostics.Tracing.EventCommandEventArgs):this
         -84 (-1.37% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Tracing.Etlx.TraceLog:SetupCallbacks(Microsoft.Diagnostics.Tracing.TraceEventDispatcher):this
         -80 (-6.25% of base) : Microsoft.CodeAnalysis.CSharp.dasm - <>c:<GetReader>b__27_0(Roslyn.Utilities.ObjectReader):System.Object:this (20 methods)
         -76 (-6.25% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - _Closure$__:_Lambda$__12-0(Roslyn.Utilities.ObjectReader):System.Object:this (19 methods)
         -72 (-1.74% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.BinderFactory:CreateBinderForNodeAndUsage(Microsoft.CodeAnalysis.VisualBasic.VisualBasicSyntaxNode,ubyte,Microsoft.CodeAnalysis.VisualBasic.Binder):Microsoft.CodeAnalysis.VisualBasic.Binder:this
         -72 (-2.03% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Microsoft.CodeAnalysis.VisualBasic.ExpressionLambdaRewriter:CreateBuiltInConversion(Microsoft.CodeAnalysis.VisualBasic.Symbols.TypeSymbol,Microsoft.CodeAnalysis.VisualBasic.Symbols.TypeSymbol,Microsoft.CodeAnalysis.VisualBasic.BoundExpression,bool,bool,int,bool):Microsoft.CodeAnalysis.VisualBasic.BoundExpression:this
Top method regressions (percentages):
           8 ( 7.14% of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[Byte][System.Byte]:.cctor()
           8 ( 6.67% of base) : System.Private.CoreLib.dasm - System.Numerics.Vector`1[Int16][System.Int16]:.cctor()
Top method improvements (percentages):
          -4 (-16.67% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceConstructorSymbol:GetReturnTypeAttributeDeclarations():Roslyn.Utilities.OneOrMany`1[[Microsoft.CodeAnalysis.SyntaxList`1[[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax, Microsoft.CodeAnalysis.CSharp, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], Microsoft.CodeAnalysis, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]:this
          -4 (-16.67% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceDestructorSymbol:GetReturnTypeAttributeDeclarations():Roslyn.Utilities.OneOrMany`1[[Microsoft.CodeAnalysis.SyntaxList`1[[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax, Microsoft.CodeAnalysis.CSharp, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], Microsoft.CodeAnalysis, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]:this
          -4 (-16.67% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMethodSymbol:GetAttributeDeclarations():Roslyn.Utilities.OneOrMany`1[[Microsoft.CodeAnalysis.SyntaxList`1[[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax, Microsoft.CodeAnalysis.CSharp, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], Microsoft.CodeAnalysis, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]:this
          -4 (-16.67% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Constructor:GetReturnTypeAttributeDeclarations():Roslyn.Utilities.OneOrMany`1[[Microsoft.CodeAnalysis.SyntaxList`1[[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax, Microsoft.CodeAnalysis.CSharp, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], Microsoft.CodeAnalysis, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]:this
          -4 (-16.67% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BeginInvokeMethod:GetReturnTypeAttributeDeclarations():Roslyn.Utilities.OneOrMany`1[[Microsoft.CodeAnalysis.SyntaxList`1[[Microsoft.CodeAnalysis.CSharp.Syntax.AttributeListSyntax, Microsoft.CodeAnalysis.CSharp, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]], Microsoft.CodeAnalysis, Version=1.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]:this
          -4 (-10.00% of base) : Microsoft.Extensions.DependencyInjection.dasm - Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteValidator:VisitRootCache(Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceCallSite,CallSiteValidatorState):System.Type:this
          -4 (-9.09% of base) : System.Private.DataContractSerialization.dasm - System.Runtime.Serialization.Json.DataContractJsonSerializer:GetDataContract(System.Runtime.Serialization.DataContract,System.Type,System.Type):System.Runtime.Serialization.DataContract
          -4 (-9.09% of base) : System.Private.DataContractSerialization.dasm - System.Runtime.Serialization.Json.DataContractJsonSerializerImpl:GetDataContract(System.Runtime.Serialization.DataContract,System.Type,System.Type):System.Runtime.Serialization.DataContract
          -4 (-9.09% of base) : System.Security.Cryptography.Csp.dasm - System.Security.Cryptography.DSACryptoServiceProvider:get_PersistKeyInCsp():bool:this
          -4 (-9.09% of base) : System.Security.Cryptography.Csp.dasm - System.Security.Cryptography.RSACryptoServiceProvider:get_PersistKeyInCsp():bool:this
          -4 (-9.09% of base) : System.Security.Cryptography.Pkcs.dasm - Internal.Cryptography.PkcsHelpers:ToSerialBytes(System.String):System.Byte[]
         -16 (-9.09% of base) : System.Security.Cryptography.Pkcs.dasm - System.Security.Cryptography.Pkcs.CmsSigner:.ctor(System.Security.Cryptography.CspParameters):this
        -164 (-8.42% of base) : System.Data.Common.dasm - System.Data.Common.DataStorage:CreateStorage(System.Data.DataColumn,System.Type,int):System.Data.Common.DataStorage
          -4 (-8.33% of base) : System.Linq.dasm - ReverseIterator`1[Byte][System.Byte]:ToArray():System.Byte[]:this
          -4 (-8.33% of base) : System.Linq.dasm - ReverseIterator`1[Int16][System.Int16]:ToArray():System.Int16[]:this
          -4 (-8.33% of base) : System.Linq.dasm - ReverseIterator`1[Int32][System.Int32]:ToArray():System.Int32[]:this
          -4 (-8.33% of base) : System.Linq.dasm - ReverseIterator`1[Double][System.Double]:ToArray():System.Double[]:this
          -4 (-8.33% of base) : System.Linq.dasm - ReverseIterator`1[Vector`1][System.Numerics.Vector`1[System.Single]]:ToArray():System.Numerics.Vector`1[System.Single][]:this
          -4 (-8.33% of base) : System.Linq.dasm - ReverseIterator`1[Int64][System.Int64]:ToArray():System.Int64[]:this
          -4 (-8.33% of base) : System.Private.CoreLib.dasm - System.StubHelpers.StubHelpers:GetHRExceptionObject(int):System.Exception
21301 total methods with Code Size differences (21299 improved, 2 regressed), 287163 unchanged.

Added IsOppositeOfPrevMov() that will skip generating redundant mov.
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 20, 2020
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@kunalspathak
Copy link
Member Author

//cc : @TamarChristinaArm

src/coreclr/src/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/emitarm64.cpp Outdated Show resolved Hide resolved
@@ -4025,6 +4025,11 @@ void emitter::emitIns_R_R(
}
}

if (IsOppositeOfPrevMov(ins, reg1, reg2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you might need to worry about the EA_4BYTE case here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think I will try to move the one above reg1 == reg2 inside the new IsOppositeOfPrevMov so we have all the checks in one place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, generalize to "is unnecessary mov" and catch all 3 cases:

mov   RX, RX

mov RX, RY
mov RY, RX

mov RX, RY
mov RX, RY

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we want to check OptimizationEnabled() before doing this optimization and mov RX, RX was earlier done without this check, will it be fine to make this optimization only when opts is enabled?

@kunalspathak
Copy link
Member Author

I noticed that for one of the crossgened method inside System.Private.Corelib.dll

System.Collections.Generic.EnumEqualityComparer`1[__Canon][System.__Canon]:IndexOf(System.__Canon[],System.__Canon,int,int):int:this

we end up removing the redundant mov instruction from a different basic block than that of the previous mov instruction.
image

I will explore one of the following possibilities:

  • Either see if we can retain the basic block information the way we retain emitLastIns. Then it will be easier to check if the instructions belong to same basic block or not. It will be useful for other future peephole optimizations as well.
  • See if we can detect the redundant mov in earlier phase and eliminate the IR nodes if src/dst registers match our elimination rules.

@AndyAyersMS
Copy link
Member

For the IG boundary check, you can follow xarch's emitter::AreUpper32BitsZero:

bool emitter::AreUpper32BitsZero(regNumber reg)
{
// Don't look back across IG boundaries (possible control flow)
if (emitCurIGinsCnt == 0)
{
return false;
}

@kunalspathak
Copy link
Member Author

For the IG boundary check, you can follow xarch's emitter::AreUpper32BitsZero:

bool emitter::AreUpper32BitsZero(regNumber reg)
{
// Don't look back across IG boundaries (possible control flow)
if (emitCurIGinsCnt == 0)
{
return false;
}

Ah sweet, didn't notice that when I referred AreUpper32BitsZero.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall structure looks good.

I assume those 2 regressions are cases where we're generating minopts code and now don't optimize?

For the new cases I think the size check needs to be more comprehensive. You need to verify that the data size in the current and prior instructions match.

@kunalspathak
Copy link
Member Author

kunalspathak commented Jun 23, 2020

I assume those 2 regressions are cases where we're generating minopts code and now don't optimize?

That's right. I am surprised that only 2 methods were affected by this.

image

For the new cases I think the size check needs to be more comprehensive. You need to verify that the data size in the current and prior instructions match.

Sorry, I didn't follow how to verify the data size. Could you please elaborate? I see what you mean. Earlier I thought you were asking me something about diffing the data size.

- Change IG boundary check to exclude extended IGs.
- Check the operand size before removing redundant movs.
@kunalspathak
Copy link
Member Author

For the new cases I think the size check needs to be more comprehensive. You need to verify that the data size in the current and prior instructions match.

Thanks @AndyAyersMS for pointing it out. Turns out String.IsAscii was generating following code:

G_M65212_IG01:
        A9BE7BFD          stp     fp, lr, [sp,#-32]!
        F9000FF3          str     x19, [sp,#24]
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 2.50
G_M65212_IG02:
        91003001          add     x1, x0, #12
        F9000BA1          str     x1, [fp,#16]	// [V02 loc1]
        F9400BAB          ldr     x11, [fp,#16]	// [V02 loc1]
        B9400801          ldr     w1, [x0,#8]
        2A0103F3          mov     w19, w1
        AA1303E1          mov     x1, x19
        AA0B03E0          mov     x0, x11
        9000000B          adrp    x11, [RELOC #0x2357246b588]
        9100016B          add     x11, x11, #0
        F9400162          ldr     x2, [x11]
        D63F0040          blr     x2
        EB00027F          cmp     x19, x0
        9A9F17E0          cset    x0, eq
						;; bbWeight=1    PerfScore 14.00
G_M65212_IG03:
        F9400FF3          ldr     x19, [sp,#24]
        A8C27BFD          ldp     fp, lr, [sp],#32
        D65F03C0          ret     lr

And I was optimizing this portion of code:

        2A0103F3          mov     w19, w1
        AA1303E1          mov     x1, x19 ; <-- Was wrongly removing this.

I have now fixed the case and also relaxed a check about IG boundary.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a size check in case (2) as well: mov X, Y; mov X.w, Y.w is not the same as mov X, Y.

The IG extends check is clever, but we now need to make sure we're always processing IGs in order. I think that's the case today, but not sure if its guaranteed. If we're ok taking that dependence, then we should make a similar change to AreUpper32BitsZero for xarch.

@kunalspathak
Copy link
Member Author

I think you need a size check in case (2) as well: mov X, Y; mov X.w, Y.w is not the same as mov X, Y.

Actually yes, I added an assert and didn't see the assert hitting during crossgen atleast, but that might not be the case while JITing. I will add the check.

The IG extends check is clever, but we now need to make sure we're always processing IGs in order. I think that's the case today, but not sure if its guaranteed. If we're ok taking that dependence, then we should make a similar change to AreUpper32BitsZero for xarch.

Yes, Bruce and Egor pointed me out this trick and I will definitely add it inside AreUpper32BitsZero as well.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One (hopefully last) comment -- in case 3 do we need to match the size check with the register class like we do for case 1?

@BruceForstall
Copy link
Member

we should make a similar change to AreUpper32BitsZero for xarch.

In a separate PR.

BruceForstall
BruceForstall previously approved these changes Jun 26, 2020
src/coreclr/src/jit/emitarm64.h Outdated Show resolved Hide resolved
src/coreclr/src/jit/emitarm64.cpp Show resolved Hide resolved
src/coreclr/src/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/emitarm64.cpp Outdated Show resolved Hide resolved
@kunalspathak
Copy link
Member Author

One (hopefully last) comment -- in case 3 do we need to match the size check with the register class like we do for case 1?

As far as my understanding, I didn't think it should be necessary to add the check because I was assuming prevDst == src and prevSrc == dst (in addition to OpSize()) should be good enough. However, giving another thought, we want to avoid optimizing cases where dst is scalar and src is vector or vice-versa. For 16BYTE, src/dst registers will always be vector. However for 8BYTE they can be scalar or vector. So probably this check should catch those cases where we will disallow doing mov:

if (size == EA_8BYTE && isVectorRegister(dst) != isVectorRegister(src)) {
   return false;
}

@BruceForstall BruceForstall self-requested a review June 26, 2020 23:26
@BruceForstall BruceForstall dismissed their stale review June 26, 2020 23:27

Need update

@kunalspathak
Copy link
Member Author

I found an issue where we were eliminating a mov that we were not supposed to.

image

Here, emitLastIns was a move immediate with only 1 valid register. However, today we zero out reg2 and hence emitLastIns->reg2 == REG_R0. Co-incidentally the current instruction's destination was x0 which matched the condition and the mov was eliminated. I don't think we can check for no. of registers in the emitLastIns. So I should be also checking the format of emitLastIns. Only the following formats are valid:

  • vector to vector: IF_DV_3C
  • scalar to scalar : IF_DR_2G or IF_DR_2E

@BruceForstall
Copy link
Member

Probably wouldn't hurt to kick off outerloop and jitstress runs on this.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a few minor suggestions.

src/coreclr/src/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/emitarm64.cpp Show resolved Hide resolved
src/coreclr/src/jit/emitarm64.cpp Outdated Show resolved Hide resolved
@kunalspathak kunalspathak merged commit 40b2e88 into dotnet:master Jun 30, 2020
@kunalspathak kunalspathak deleted the peep-movs branch June 30, 2020 15:07
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ARM64: Redundant movs can be eliminiated
5 participants