Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Poor codegen when calling an instance method on a temporary, unused struct #42354

Closed
Thealexbarney opened this issue Sep 16, 2020 · 3 comments · Fixed by #79341
Closed
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@Thealexbarney
Copy link

Thealexbarney commented Sep 16, 2020

Description

Poor codegen in the following situation:

  1. Initialize a new struct
  2. Call an instance method on the struct that doesn't depend on any struct fields
  3. Don't use the struct again

Configuration

Running a recent master build on Win 10 2004 x64

Other information

Given this code, the JIT produces this disassembly for Method1 and Method2:

public static int Method1() => new Rng().NextRandom();
public static int Method2() => CreateRng().NextRandom();

public static Rng CreateRng() => new Rng();

public struct Rng
{
    public int NextRandom() => 4;
}

If you do it the normal way (new Rng().NextRandom()) the JIT doesn't completely optimize it.
It insists on initializing the struct. If the struct is a larger size the JIT would zero the entire struct on the stack before returning 4.

; Assembly listing for method Program:Method1():int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 loc0         [V00    ] (  1,  1   )  struct ( 8) [rsp+0x00]   do-not-enreg[XS] must-init addr-exposed ld-addr-op
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;  V02 tmp1         [V02,T00] (  2,  4   )   byref  ->  rax         "dup spill"
;
; Lcl frame size = 8

G_M20050_IG01:
       50                   push     rax
       33C0                 xor      rax, rax
       48890424             mov      qword ptr [rsp], rax
						;; bbWeight=1    PerfScore 2.25
G_M20050_IG02:
       488D0424             lea      rax, bword ptr [rsp]
       33D2                 xor      edx, edx
       8810                 mov      byte  ptr [rax], dl
       B804000000           mov      eax, 4
						;; bbWeight=1    PerfScore 2.00
G_M20050_IG03:
       4883C408             add      rsp, 8
       C3                   ret      
						;; bbWeight=1    PerfScore 1.25

; Total bytes of code 25, prolog size 7, PerfScore 8.00, (MethodHash=ba7bb1ad) for method Program:Method1():int
; ============================================================

If you split new Rng() into its own function, the JIT does completely optimize it.

; Assembly listing for method Program:Method2():int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 loc0         [V00    ] (  0,  0   )  struct ( 8) zero-ref    do-not-enreg[SB] ld-addr-op
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct ( 8) zero-ref    do-not-enreg[SB] ld-addr-op "Inline ldloca(s) first use temp"
;
; Lcl frame size = 0

G_M12881_IG01:
						;; bbWeight=1    PerfScore 0.00
G_M12881_IG02:
       B804000000           mov      eax, 4
						;; bbWeight=1    PerfScore 0.25
G_M12881_IG03:
       C3                   ret      
						;; bbWeight=1    PerfScore 1.00

; Total bytes of code 6, prolog size 0, PerfScore 1.85, (MethodHash=7270cdae) for method Program:Method2():int
; ============================================================

category:cq
theme:structs
skill-level:intermediate
cost:medium

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Sep 16, 2020
@EgorBo
Copy link
Member

EgorBo commented Sep 17, 2020

IR after Morph:

Method1

***** BB01
STMT00000 (IL 0x000...0x004)
               [000003] -A---+------              *  ASG       byref 
               [000002] D----+-N----              +--*  LCL_VAR   byref  V02 tmp1         
               [000001] -----+------              \--*  LCL_VAR_ADDR byref  V00 loc0         

***** BB01
STMT00001 (IL   ???...  ???)
               [000008] IA-X-+------              *  ASG       struct (init)
               [000007] ---X-+-N----              +--*  BLK       struct<1>
               [000005] -----+------              |  \--*  LCL_VAR   byref  V02 tmp1         
               [000006] -----+------              \--*  CNS_INT   int    0

***** BB01
STMT00003 (IL   ???...  ???)
               [000011] -----+------              *  RETURN    int   
               [000012] -----+------              \--*  CNS_INT   int    4

Method2

***** BB01
STMT00004 (IL 0x000...  ???)
               [000013] IA---+------              *  ASG       struct (init)
               [000010] D----+-N----              +--*  LCL_VAR   struct<Rng, 1> V02 tmp1         
               [000012] -----+------              \--*  CNS_INT   int    0

***** BB01
STMT00001 (IL   ???...  ???)
               [000004] IA---+------              *  ASG       struct (init)
               [000002] D----+-N----              +--*  LCL_VAR   struct<Rng, 1> V00 loc0         
               [000014] -----+------              \--*  CNS_INT   int    0

***** BB01
STMT00003 (IL   ???...  ???)
               [000009] -----+------              *  RETURN    int   
               [000016] -----+------              \--*  CNS_INT   int    4

@sandreenko sandreenko added this to the 6.0.0 milestone Sep 18, 2020
@sandreenko sandreenko removed the untriaged New issue has not been triaged by the area owner label Sep 18, 2020
@JulieLeeMSFT JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2021
@sandreenko sandreenko removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Apr 12, 2021
@sandreenko
Copy link
Contributor

This issue is interesting, there are 2 solutions, I think both have their own benefits and should be implemented:

  1. Don't spill addr of local because they are trivial to clone,
    change
    if (!opts.compDbgCode && !op1->IsIntegralConst(0) && !op1->IsFPZero() && !op1->IsLocal())

    to
if (!opts.compDbgCode && !op1->IsIntegralConst(0) && !op1->IsFPZero() && !op1->IsLocal() && !op1->OperIsLocalAddr())

however, it won't work because importer sees it as "ADDR(LCL_VAR())` so we need a part of #11057.

  1. continue the work to treat LCL_VAR_ADDR as consts started in CQ: Don't force const arguments to temps. #47786 and add O2K_LCL_VAR_ADDR, O2K_LCL_FLD_ADDR to optOp2Kind so VN created assertions like:
    V02 = &V01
    and uses it to see that some stores are dead.

Not sure if we have time for it in 6.0 but lets keep it here for now.

@sandreenko sandreenko modified the milestones: 6.0.0, 7.0.0 Jul 9, 2021
@JulieLeeMSFT JulieLeeMSFT modified the milestones: 7.0.0, 8.0.0 Jun 3, 2022
@JulieLeeMSFT
Copy link
Member

Moving to .NET 8.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 7, 2022
jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Dec 7, 2022
Roslyn emits ldloca + dup + initobj when initializing structs. Normally
we clone address trees instead of creating a local for them (which will
address expose the local), but we treat this initobj pattern specially.

Remove this special treatment. It means we sometimes end up with
slightly larger code because we no longer have a register with the
address in it (could potentially be fixed by CSE), but avoiding the
address exposure seems like the right trade off to me.

Fix dotnet#42354
@jakobbotsch jakobbotsch self-assigned this Dec 7, 2022
jakobbotsch added a commit that referenced this issue Dec 8, 2022
Roslyn emits ldloca + dup + initobj when initializing structs. Normally
we clone address trees instead of creating a local for them (which will
address expose the local), but we treat this initobj pattern specially.

Remove this special treatment. It means we sometimes end up with
slightly larger code because we no longer have a register with the
address in it (could potentially be fixed by CSE), but avoiding the
address exposure seems like the right trade off to me.

Fix #42354
Fix #57055
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 8, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jan 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants