-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative fix for folding of *(typ*)&lclVar for small types #40607 #40871
Conversation
7ff4390
to
1eb7f28
Compare
I think the issue here could be categorized as another instance of now closed #620 It seems that the changes in #40535 didn't address the issue completely and the following describes why. Setting In the PR example, the execution chain will always follow
As a consequence, the JIT generates The test in #40607 started failing after #40535 was merged that made such case more likely to fail (but I don't think it introduced the issue). As a one of the low risk solutions, we can move check for the narrowing stores and move setting of Making such locals addressed exposed introduces 3 kbytes of regressions in S.P.C alone, so this option doesn't look viable. @dotnet/jit-contrib Any other ideas? This is silent bad codegen issue, so perhaps, we want to fix this in 5.0? |
Another option could be to make the transformation in
|
11638c8
to
47a9b26
Compare
@dotnet/jit-contrib The approach in 47a9b26 seems to be working on the existing set of tests and gives quite favorable diffs
However, I don't like using |
Wonder if we could do the bookkeeping in the MorphAddrContext? |
4f1de54
to
b21dd01
Compare
I guess we can, but I wonder if can simply set |
I added more tests to cover possible variants of loads. I also added small optimization to cover folding of |
I triggered jitstress pipelines of coreclr tests and libraries tests and enabled the originally failing test. |
Failure in runtime-coreclr libraries-jitstress (Libraries Test Run checked coreclr Linux arm Release) is #36752 |
The following are the jit diffs for the change win-x64
win-x86
win-arm (altjit)
win-arm64 (altjit) - although I wouldn't trust these numbers due to #40354
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is good, it fixes some issues and I like the template test.
However, It does not look like this transformation is bug-free now, my main concern is ASG
where dst is not LCL_VAR
but some COMMA(smth, LCL_VAR)
where VAR_FOLDED_IND
is set but this condition does not catch it.
For example, we can have a tree like (after global morph):
[000028] -A-XG+------ * ASG byte
[000051] -A-X-+-N---- +--* COMMA byte
[000048] -A---+------ | +--* ASG long
[000047] D----+-N---- | | +--* LCL_VAR long V05 tmp2
[000021] -----+------ | | \--* LCL_VAR byref V02 loc0
[000052] *--X-+-N---- | \--* IND byte
[000049] -----+------ | \--* LCL_VAR long V05 tmp2
[000025] -----+------ \--* MUL int
[000053] -----+------ +--* CAST int <- byte <- int
[000023] -----+------ | \--* LCL_VAR int V00 arg0
[000024] -----+------ \--* LCL_VAR int V01 arg1
or
[000012] -A-XG+------ * ASG byte
[000027] ---XG+-N---- +--* COMMA byte
[000023] ---X-+------ | +--* ARR_BOUNDS_CHECK_Rng void
[000009] -----+------ | | +--* CNS_INT int 0
[000022] ---X-+------ | | \--* ARR_LENGTH int
[000008] -----+------ | | \--* LCL_VAR ref V03 tmp1
[000011] a---G+-N---- | \--* IND byte
[000026] -----+------ | \--* ADD byref
[000020] -----+------ | +--* LCL_VAR ref V03 tmp1
[000025] -----+------ | \--* CNS_INT long 16 Fseq[#FirstElem]
[000010] -----+------ \--* LCL_VAR int V01 arg1
and many others, I don't think we have confidence that none of them could hide VAR_FOLDED_IND
.
b21dd01
to
eb91303
Compare
Top 4 method improvements disassembly diffs: GetByteWithAllBitsSet diff --git a/base/GetByteWithAllBitsSet.dasm b/diff/GetByteWithAllBitsSet.dasm
index d56a5bd..2d00fcd 100644
--- a/base/GetByteWithAllBitsSet.dasm
+++ b/diff/GetByteWithAllBitsSet.dasm
@@ -6,25 +6,19 @@
; partially interruptible
; Final local variable assignments
;
-; V00 loc0 [V00,T00] ( 3, 3 ) ubyte -> [rsp+0x04] do-not-enreg[F] ld-addr-op
+;* V00 loc0 [V00,T00] ( 0, 0 ) ubyte -> zero-ref ld-addr-op
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;
-; Lcl frame size = 8
+; Lcl frame size = 0
G_M54416_IG01:
- push rax
- ;; bbWeight=1 PerfScore 1.00
+ ;; bbWeight=1 PerfScore 0.00
G_M54416_IG02:
- xor eax, eax
- mov dword ptr [rsp+04H], eax
- mov byte ptr [rsp+04H], 255
- mov eax, dword ptr [rsp+04H]
- movzx rax, al
- ;; bbWeight=1 PerfScore 3.50
+ mov eax, 255
+ ;; bbWeight=1 PerfScore 0.25
G_M54416_IG03:
- add rsp, 8
ret
- ;; bbWeight=1 PerfScore 1.25
+ ;; bbWeight=1 PerfScore 1.00
-; Total bytes of code 24, prolog size 1, PerfScore 8.15, (MethodHash=98492b6f) for method System.Numerics.ConstantHelper:GetByteWithAllBitsSet():ubyte
+; Total bytes of code 6, prolog size 0, PerfScore 1.85, (MethodHash=98492b6f) for method System.Numerics.ConstantHelper:GetByteWithAllBitsSet():ubyte GetInt16WithAllBitsSet diff --git a/base/GetInt16WithAllBitsSet.dasm b/diff/GetInt16WithAllBitsSet.dasm
index 62b1ebd..bca7621 100644
--- a/base/GetInt16WithAllBitsSet.dasm
+++ b/diff/GetInt16WithAllBitsSet.dasm
@@ -6,25 +6,19 @@
; partially interruptible
; Final local variable assignments
;
-; V00 loc0 [V00,T00] ( 3, 3 ) short -> [rsp+0x04] do-not-enreg[F] ld-addr-op
+;* V00 loc0 [V00,T00] ( 0, 0 ) short -> zero-ref ld-addr-op
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;
-; Lcl frame size = 8
+; Lcl frame size = 0
G_M60291_IG01:
- push rax
- ;; bbWeight=1 PerfScore 1.00
+ ;; bbWeight=1 PerfScore 0.00
G_M60291_IG02:
- xor eax, eax
- mov dword ptr [rsp+04H], eax
- mov word ptr [rsp+04H], -1
- mov eax, dword ptr [rsp+04H]
- movsx rax, ax
- ;; bbWeight=1 PerfScore 3.50
+ mov eax, -1
+ ;; bbWeight=1 PerfScore 0.25
G_M60291_IG03:
- add rsp, 8
ret
- ;; bbWeight=1 PerfScore 1.25
+ ;; bbWeight=1 PerfScore 1.00
-; Total bytes of code 27, prolog size 1, PerfScore 8.45, (MethodHash=7108147c) for method System.Numerics.ConstantHelper:GetInt16WithAllBitsSet():short
+; Total bytes of code 6, prolog size 0, PerfScore 1.85, (MethodHash=7108147c) for method System.Numerics.ConstantHelper:GetInt16WithAllBitsSet():short GetSByteWithAllBitsSet diff --git a/base/GetSByteWithAllBitsSet.dasm b/diff/GetSByteWithAllBitsSet.dasm
index 19061db..227a3f8 100644
--- a/base/GetSByteWithAllBitsSet.dasm
+++ b/diff/GetSByteWithAllBitsSet.dasm
@@ -6,25 +6,19 @@
; partially interruptible
; Final local variable assignments
;
-; V00 loc0 [V00,T00] ( 3, 3 ) byte -> [rsp+0x04] do-not-enreg[F] ld-addr-op
+;* V00 loc0 [V00,T00] ( 0, 0 ) byte -> zero-ref ld-addr-op
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;
-; Lcl frame size = 8
+; Lcl frame size = 0
G_M17686_IG01:
- push rax
- ;; bbWeight=1 PerfScore 1.00
+ ;; bbWeight=1 PerfScore 0.00
G_M17686_IG02:
- xor eax, eax
- mov dword ptr [rsp+04H], eax
- mov byte ptr [rsp+04H], -1
- mov eax, dword ptr [rsp+04H]
- movsx rax, al
- ;; bbWeight=1 PerfScore 3.50
+ mov eax, -1
+ ;; bbWeight=1 PerfScore 0.25
G_M17686_IG03:
- add rsp, 8
ret
- ;; bbWeight=1 PerfScore 1.25
+ ;; bbWeight=1 PerfScore 1.00
-; Total bytes of code 25, prolog size 1, PerfScore 8.25, (MethodHash=a68abae9) for method System.Numerics.ConstantHelper:GetSByteWithAllBitsSet():byte
+; Total bytes of code 6, prolog size 0, PerfScore 1.85, (MethodHash=a68abae9) for method System.Numerics.ConstantHelper:GetSByteWithAllBitsSet():byte GetUInt16WithAllBitsSet diff --git a/base/GetUInt16WithAllBitsSet.dasm b/diff/GetUInt16WithAllBitsSet.dasm
index cbecdb0..d59d84e 100644
--- a/base/GetUInt16WithAllBitsSet.dasm
+++ b/diff/GetUInt16WithAllBitsSet.dasm
@@ -6,25 +6,19 @@
; partially interruptible
; Final local variable assignments
;
-; V00 loc0 [V00,T00] ( 3, 3 ) ushort -> [rsp+0x04] do-not-enreg[F] ld-addr-op
+;* V00 loc0 [V00,T00] ( 0, 0 ) ushort -> zero-ref ld-addr-op
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;
-; Lcl frame size = 8
+; Lcl frame size = 0
G_M13827_IG01:
- push rax
- ;; bbWeight=1 PerfScore 1.00
+ ;; bbWeight=1 PerfScore 0.00
G_M13827_IG02:
- xor eax, eax
- mov dword ptr [rsp+04H], eax
- mov word ptr [rsp+04H], 0xFFFF
- mov eax, dword ptr [rsp+04H]
- movzx rax, ax
- ;; bbWeight=1 PerfScore 3.50
+ mov eax, 0xFFFF
+ ;; bbWeight=1 PerfScore 0.25
G_M13827_IG03:
- add rsp, 8
ret
- ;; bbWeight=1 PerfScore 1.25
+ ;; bbWeight=1 PerfScore 1.00
-; Total bytes of code 26, prolog size 1, PerfScore 8.35, (MethodHash=ae96c9fc) for method System.Numerics.ConstantHelper:GetUInt16WithAllBitsSet():ushort
+; Total bytes of code 6, prolog size 0, PerfScore 1.85, (MethodHash=ae96c9fc) for method System.Numerics.ConstantHelper:GetUInt16WithAllBitsSet():ushort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the master branch, I like definitelyLoad/possiblyStore
vars and the additional testing.
Does it make sense to collect the diffs with this change with the version before #40535?
I think it does - I will work on this while waiting for the tests to finish. |
I did comparison of (reverted #40535 on top of 3522c32) - base against (these changes on top of the same commit) - diff. win-x64
win-x86
As an example, System.Span`1[SByte][System.SByte]:Fill(byte):this runtime/src/libraries/System.Private.CoreLib/src/System/Span.cs Lines 283 to 293 in 42ebef6
has the following disassembly diffs G_M58981_IG01:
- ;; bbWeight=1 PerfScore 0.00
+ push rax
+ ;; bbWeight=1 PerfScore 1.00
G_M58981_IG02:
mov r8d, dword ptr [rcx+8]
test r8d, r8d
jne SHORT G_M58981_IG04
;; bbWeight=1 PerfScore 3.25
G_M58981_IG03:
+ add rsp, 8
ret
- ;; bbWeight=0.50 PerfScore 0.50
+ ;; bbWeight=0.50 PerfScore 0.63
G_M58981_IG04:
movsx rdx, dl
+ mov dword ptr [rsp+04H], edx
mov rcx, bword ptr [rcx]
+ movzx rdx, byte ptr [rsp+04H]
call [CORINFO_HELP_MEMSET]
nop
- ;; bbWeight=0.50 PerfScore 2.75
+ ;; bbWeight=0.50 PerfScore 3.75
G_M58981_IG05:
+ add rsp, 8
ret
- ;; bbWeight=0.50 PerfScore 0.50
+ ;; bbWeight=0.50 PerfScore 0.63
-; Total bytes of code 25, prolog size 0, PerfScore 9.50, (MethodHash=97d3199a) for method System.Span`1[SByte][System.SByte]:Fill(byte):this
+; Total bytes of code 43, prolog size 1, PerfScore 13.55, (MethodHash=97d3199a) for method System.Span`1[SByte][System.SByte]:Fill(byte):this The difference comes from the following subtle change in decision made by morph base
diff
where
|
1) it is *definitely load* and types of both indirection and local variable have the same signedness (e.g. bool and byte) 2) otherwise, fold the tree and mark the local node with GTF_VAR_FOLDED_IND and call fgDoNormalizeOnStore() on such nodes' parents in post-order morph.
eb91303
to
b41c05c
Compare
@briansull PTAL |
I didn't like that either, and an alternative is to use GTF_IND_ASG_LHS (as it means the same thing as is used for building SSA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Great
Thanks for this work.
/backport to release/5.0-rc2 |
Started backporting to release/5.0-rc2: https://github.com/dotnet/runtime/actions/runs/238515964 |
@echesakovMSFT backporting to release/5.0-rc2 failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Add regression test for https://github.com/dotnet/runtime/issues/40607
Applying: Add Runtime_40607.tt
Applying: Add more extensive tests for loads in Runtime_40607.tt Runtime_40607.il
Applying: Enable back failing test in System.Private.Xml.dll
error: sha1 information is lacking or useless (src/libraries/System.Private.Xml/tests/Xslt/XslTransformApi/CXslTransformMultith.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0004 Enable back failing test in System.Private.Xml.dll
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
The code generated on
DependsOnUnInitValue
make an assumption that upper 3 bytes of[rsp+04H]
location were zero-ed and reads the value as "full" int (i.e. doesn't normalize the value on load).