Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: More elaborate store forwarding patterns with physical promotion #86665

Open
jakobbotsch opened this issue May 23, 2023 · 3 comments
Open
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@jakobbotsch
Copy link
Member

jakobbotsch commented May 23, 2023

I noticed the following diff with physical promotion enabled:

+22 (+733.33%) : 48321.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextStateMap:GetContextForFileStart():Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextState

@@ -8,20 +8,29 @@
 ; Final local variable assignments
 ;
 ;# V00 OutArgs      [V00    ] (  1,  1   )  struct ( 0) [rsp+00H]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;* V01 tmp1         [V01    ] (  0,  0   )  struct ( 8) zero-ref    do-not-enreg[SF] ld-addr-op "NewObj constructor temp"
+;  V01 tmp1         [V01,T00] (  4,  8   )  struct ( 8) [rsp+00H]   do-not-enreg[SF] ld-addr-op "NewObj constructor temp"
+;* V02 tmp2         [V02    ] (  0,  0   )     int  ->  zero-ref    "V01.[000..004)"
+;* V03 tmp3         [V03,T01] (  0,  0   )   ubyte  ->  zero-ref    "V01.[004..005)"
+;* V04 tmp4         [V04,T02] (  0,  0   )   ubyte  ->  zero-ref    "V01.[005..006)"
 ;
-; Lcl frame size = 0
+; Lcl frame size = 8
 
 G_M10451_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-						;; size=0 bbWeight=1 PerfScore 0.00
+       push     rax
+						;; size=1 bbWeight=1 PerfScore 1.00
 G_M10451_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      dword ptr [rsp], eax
+       mov      byte  ptr [rsp+04H], 0
+       mov      byte  ptr [rsp+05H], 0
+       mov      rax, qword ptr [rsp]
+						;; size=19 bbWeight=1 PerfScore 4.25
 G_M10451_IG03:        ; bbWeight=1, epilog, nogc, extend
+       add      rsp, 8
        ret      
-						;; size=1 bbWeight=1 PerfScore 1.00
+						;; size=5 bbWeight=1 PerfScore 1.25
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.55, instruction count 2, allocated bytes for code 3 (MethodHash=08d0d72c) for method Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextStateMap:GetContextForFileStart():Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextState
+; Total bytes of code 25, prolog size 1, PerfScore 9.00, instruction count 8, allocated bytes for code 25 (MethodHash=08d0d72c) for method Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextStateMap:GetContextForFileStart():Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextState
 ; ============================================================

This is the result of a zero init of the struct + read back + return, with the struct having two padding bytes at the end. We would need to teach the store forwarding optimization in lowering about padding to catch this (or alternatively allow RETURN(FIELD_LIST)).

Normal promotion just sees a block init with holes and DNERs it.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 23, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 23, 2023
@jakobbotsch jakobbotsch self-assigned this May 23, 2023
@ghost
Copy link

ghost commented May 23, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

I noticed the following diff with physical promotion enabled:

+22 (+733.33%) : 48321.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextStateMap:GetContextForFileStart():Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextState

@@ -8,20 +8,29 @@
 ; Final local variable assignments
 ;
 ;# V00 OutArgs      [V00    ] (  1,  1   )  struct ( 0) [rsp+00H]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
-;* V01 tmp1         [V01    ] (  0,  0   )  struct ( 8) zero-ref    do-not-enreg[SF] ld-addr-op "NewObj constructor temp"
+;  V01 tmp1         [V01,T00] (  4,  8   )  struct ( 8) [rsp+00H]   do-not-enreg[SF] ld-addr-op "NewObj constructor temp"
+;* V02 tmp2         [V02    ] (  0,  0   )     int  ->  zero-ref    "V01.[000..004)"
+;* V03 tmp3         [V03,T01] (  0,  0   )   ubyte  ->  zero-ref    "V01.[004..005)"
+;* V04 tmp4         [V04,T02] (  0,  0   )   ubyte  ->  zero-ref    "V01.[005..006)"
 ;
-; Lcl frame size = 0
+; Lcl frame size = 8
 
 G_M10451_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-						;; size=0 bbWeight=1 PerfScore 0.00
+       push     rax
+						;; size=1 bbWeight=1 PerfScore 1.00
 G_M10451_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
        xor      eax, eax
-						;; size=2 bbWeight=1 PerfScore 0.25
+       mov      dword ptr [rsp], eax
+       mov      byte  ptr [rsp+04H], 0
+       mov      byte  ptr [rsp+05H], 0
+       mov      rax, qword ptr [rsp]
+						;; size=19 bbWeight=1 PerfScore 4.25
 G_M10451_IG03:        ; bbWeight=1, epilog, nogc, extend
+       add      rsp, 8
        ret      
-						;; size=1 bbWeight=1 PerfScore 1.00
+						;; size=5 bbWeight=1 PerfScore 1.25
 
-; Total bytes of code 3, prolog size 0, PerfScore 1.55, instruction count 2, allocated bytes for code 3 (MethodHash=08d0d72c) for method Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextStateMap:GetContextForFileStart():Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextState
+; Total bytes of code 25, prolog size 1, PerfScore 9.00, instruction count 8, allocated bytes for code 25 (MethodHash=08d0d72c) for method Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextStateMap:GetContextForFileStart():Microsoft.CodeAnalysis.CSharp.Syntax.NullableContextState
 ; ============================================================

This is the result of a zero init of the struct + read back + return, with the struct having two padding bytes at the end. We would need to teach the store forwarding optimization in lowering about padding to catch this (or alternatively allow RETURN(FIELD_LIST)).

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch jakobbotsch added this to the Future milestone May 23, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label May 23, 2023
@jakobbotsch jakobbotsch modified the milestones: Future, 9.0.0 Nov 27, 2023
@jakobbotsch
Copy link
Member Author

My long term plan here is to make FIELD_LIST a better supported node and start using it for GT_RETURN as well. That'll allow the JIT to transform to that representation for returns.

@jakobbotsch jakobbotsch modified the milestones: 9.0.0, 10.0.0 Jul 29, 2024
@jakobbotsch
Copy link
Member Author

Once the work is done for more first-class support for FIELD_LIST we should also be able to use it to fix the TODO mentioned in #78131 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

1 participant