-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the OBJ(ADDR)
wrapping created for calls by the importer
#51569
Comments
OBJ(ADDR(LCL_VAR))
trees when importing some static intrinsics.OBJ(ADDR(LCL_VAR))
trees when importing some static intrinsics
@sandreenko perhaps this is something you can investigate? |
I have investigated this in the past, there are 2 issues:
if we don't have OBJ(ADDR()) sugar we do const propagation:
and then there is no handling that the argument has to be passed by reference, a right solution was described here #1231 (comment) but has not been implemented yet what we want after assertion prop:
see that it is passed by ref, spill it:
of course, we can change fgMorphArgs to catch this case without introducing @SingleAccretion could you please show IR after morph and what optimization does not happen because of OBJ(ADDR())? |
@sandreenko Of course. *************** Finishing PHASE Morph - Global
Trees after Morph - Global
-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight lp [IL range] [jump] [EH region] [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000] 1 1 [000..014)-> BB03 ( cond ) i
BB02 [0001] 1 BB01 1 [014..016) (return) i
BB03 [0002] 1 BB01 1 [016..018) (return) i
-----------------------------------------------------------------------------------------------------------------------------------------
------------ BB01 [000..014) -> BB03 (cond), preds={} succs={BB02,BB03}
***** BB01
STMT00000 (IL 0x000...0x005)
[000003] -A---+------ * ASG simd16 (copy)
[000001] D----+-N---- +--* LCL_VAR simd16<System.Numerics.Vector`1[Int32]> V00 arg0
[000000] -----+------ \--* HWINTRINSIC simd16 int get_Zero
***** BB01
STMT00001 (IL 0x007...0x012)
[000011] -----+------ * JTRUE void
[000010] J----+-N---- \--* EQ int
[000008] -----+------ +--* HWINTRINSIC bool int op_Equality
[000007] n----+------ | +--* OBJ simd16<System.Numerics.Vector`1[Int32]>
[000006] -----+------ | | \--* ADDR byref
[000004] -----+-N---- | | \--* LCL_VAR simd16<System.Numerics.Vector`1[Int32]> V00 arg0
[000005] -----+------ | \--* HWINTRINSIC simd16 int get_Zero
[000009] -----+------ \--* CNS_INT int 0
------------ BB02 [014..016) (return), preds={BB01} succs={}
***** BB02
STMT00003 (IL 0x014...0x015)
[000015] -----+------ * RETURN int
[000014] -----+------ \--* CNS_INT int 2
------------ BB03 [016..018) (return), preds={BB01} succs={}
***** BB03
STMT00002 (IL 0x016...0x017)
[000013] -----+------ * RETURN int
[000012] -----+------ \--* CNS_INT int 3
-------------------------------------------------------------------------------------------------------------------
As far as I can tell, this doesn't impact anything today, because we don't really do folding for vectors, but it does mean the trees become somewhat opaque to VN. N001 [000004] LCL_VAR V00 arg0 u:2 (last use) => $100 {HWI_Vector128_get_Zero()}
N002 [000006] ADDR => $180 {PtrToLoc($40, $0)}
N003 [000007] OBJ => $141 {141}
N004 [000005] HWINTRINSIC => $100 {HWI_Vector128_get_Zero()}
N005 [000008] HWINTRINSIC => $1c0 {HWI_Vector128_op_Equality($141, $100)}
N006 [000009] CNS_INT 0 => $40 {IntCns 0}
N007 [000010] EQ => $200 {EQ($1c0, $40)}
***** BB01, STMT00001(after)
N008 ( 15, 13) [000011] ------------ * JTRUE void
N007 ( 13, 11) [000010] J------N---- \--* EQ int $200
N005 ( 11, 9) [000008] ------------ +--* HWINTRINSIC bool int op_Equality $1c0
N003 ( 9, 7) [000007] n----------- | +--* OBJ simd16<System.Numerics.Vector`1[Int32]> $141
N002 ( 3, 3) [000006] ------------ | | \--* ADDR byref $180
N001 ( 3, 2) [000004] -------N---- | | \--* LCL_VAR simd16<System.Numerics.Vector`1[Int32]> V00 arg0 u:2 (last use) $100
N004 ( 1, 1) [000005] ------------ | \--* HWINTRINSIC simd16 int get_Zero $100
N006 ( 1, 1) [000009] ------------ \--* CNS_INT int 0 $40 I think the issues you describe are much more general than this one, because it is quite specific to the exact setup I had for the reproduction (sorry for not calling this out, I did not have a good handle on this myself :(): parameter as a local + Unix ABI (so no implicit byrefs in the picture). There is no such wrapping for locals: public static int ObjAddrWrapping(int a)
{
var c = Vector<int>.Zero;
if (a is 0)
return 0;
if (c == Vector<int>.Zero)
return 2;
return 3;
} is imported as one would expect: [ 2] 17 (0x011) call 0A000006
In Compiler::impImportCall: opcode is call, kind=0, callRetType is bool, structSize is 0
Named Intrinsic System.Numerics.Vector`1.op_Equality: Recognized
[ 1] 22 (0x016) brfalse.s
STMT00002 (IL 0x00B... ???)
[000013] ------------ * JTRUE void
[000012] ------------ \--* EQ int
[000010] ------------ +--* HWINTRINSIC bool int op_Equality
[000008] ------------ | +--* LCL_VAR simd16<System.Numerics.Vector`1[Int32]> V01 loc0
[000009] ------------ | \--* HWINTRINSIC simd16 int get_Zero
[000011] ------------ \--* CNS_INT int 0 And so it is really me noticing this unexpected IR shape.
By the way, I have been wondering about this (as it is the IR I am looking at in #51500) - what is the desirable shape (that we want to have for it moving forward) for this tree specifically when we have |
This looks to also be the root cause of #50939 Take code such as: [MethodImpl(MethodImplOptions.AggressiveInlining)]
public static float Distance(Vector2 value1, Vector2 value2)
{
float distanceSquared = DistanceSquared(value1, value2);
return MathF.Sqrt(distanceSquared);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static float Distance2(Vector2 value1, Vector2 value2)
{
Vector2 difference = value1 - value2;
return difference.Length();
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static float DistanceSquared(Vector2 value1, Vector2 value2)
{
Vector2 difference = value1 - value2;
return difference.LengthSquared();
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
static void Main(string[] args)
{
var Vector2Value1 = new Vector2(-1.0f, 1.0f);
var Vector2ValueInverted1 = new Vector2(1.0f, -1.0f);
var result1 = Distance(Vector2Value1, Vector2ValueInverted1);
Console.WriteLine(result1);
var Vector2Value2 = new Vector2(-2.0f, 2.0f);
var Vector2ValueInverted2 = new Vector2(2.0f, -2.0f);
var result2 = Distance2(Vector2Value2, Vector2ValueInverted2);
Console.WriteLine(result2);
}
IN000e: 000049 vmovsd qword ptr [V06 rsp+30H], xmm1
IN000f: 00004F vmovsd qword ptr [V07 rsp+28H], xmm0
IN0010: 000055 vmovsd xmm0, qword ptr [V06 rsp+30H]
IN0011: 00005B vmovsd xmm1, qword ptr [V07 rsp+28H]
IN0012: 000061 vsubps xmm0, xmm0, xmm1
IN0013: 000065 vdpps xmm0, xmm0, xmm0, 49
IN0014: 00006B vsqrtss xmm0, xmm0 While IN0021: 0000AA vsubps xmm0, xmm2, xmm0
IN0022: 0000AE vdpps xmm0, xmm0, xmm0, 49
IN0023: 0000B4 vsqrtss xmm0, xmm0 The ultimate difference between the two is that the former results in |
@sandreenko, at least in the case I just listed, this looks to come down to We then end up promoting the SIMD locals because we allow that for However, it also isn't clear why we aren't marking |
Inserting: #if FEATURE_SIMD
if (varTypeIsSIMD(dst->gtType))
{
// We want to track SIMD assignments as being intrinsics since they
// are functionally SIMD `mov` instructions and are more efficient
// when we don't promote, particularly when it occurs due to inlining
SetOpLclRelatedToSIMDIntrinsic(dst);
SetOpLclRelatedToSIMDIntrinsic(src);
}
#endif // FEATURE_SIMD in We are currently generating |
One potential idea here is to modify |
OBJ(ADDR(LCL_VAR))
trees when importing some static intrinsicsOBJ(ADDR)
wrapping created for calls by the importer
Essentially, this is a subset of #11057, the source of this wrapping is |
This was generally fixed by #81636 @SingleAccretion, do you want this kept open to track other OBJ wrapping we might be introducing? |
No, we can close this assuming the performance impact of #81636 has been evaluated as acceptable. |
Don't believe we've actually seen regressions in the perf triage yet. Will close and reopen if anything crops up. |
Reproduction:
The nodes in question:
This happens because
impSIMDPopStack
callsimpNormStructVal
which wraps local variables in theOBJ(ADDR)
sequence, presumably to avoid calls with mistyped arguments leading to bad codegen. The supposition of this issue is that this wrapping can be avoided when possible, as it blocks downstream optimizations.category:cq
theme:importer
skill-level:expert
cost:medium
impact:small
The text was updated successfully, but these errors were encountered: