Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use destType for size instead of srcType when writing a variable from a register to the stack. #46176

Merged
merged 1 commit into from
Dec 17, 2020

Conversation

jkoritzinsky
Copy link
Member

@jkoritzinsky jkoritzinsky commented Dec 17, 2020

If we use srcType, we'll in some cases do a 64-bit store instead of a 32-bit store or vice versa. This is causing random failures as described in #46172.

Alternative fix to #46172

I've validated that this PR has no diffs outside of P/Invoke stubs from the JIT in 35fbaef (the commit before #45625).

cc: @safern

@jkoritzinsky jkoritzinsky added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 17, 2020
@jkoritzinsky jkoritzinsky requested a review from jkotas December 17, 2020 01:07
@jkoritzinsky
Copy link
Member Author

cc @dotnet/jit-contrib

@jkoritzinsky
Copy link
Member Author

jkoritzinsky commented Dec 17, 2020

The P/Invoke stub diffs are due to introducing a P/Invoke Frame instead of omitting it. As a result, I am fully confident that this is a complete fix for the random crashes.

Base
; Assembly listing for method ILStubClass:IL_STUB_PInvoke(int):int
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
; ReadyToRun compilation
; optimized code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )     int  ->  rdi        
;* V01 loc0         [V01    ] (  0,  0   )     int  ->  zero-ref   
;  V02 loc1         [V02,T02] (  2,  2   )    long  ->  rdi        
;* V03 loc2         [V03    ] (  0,  0   )     int  ->  zero-ref   
;  V04 loc3         [V04,T03] (  2,  2   )     int  ->  rbx        
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;  V06 tmp1         [V06    ] (  2,  2   )    long  ->  [rbp-0x10]   do-not-enreg[X] addr-exposed "stub argument"
;  V07 tmp2         [V07,T01] (  2,  4   )    long  ->  rax         "impImportIndirectCall"
;
; Lcl frame size = 8

G_M63189_IG01:
       push     rbp
       push     rbx
       push     rax
       lea      rbp, [rsp+10H]
       mov      qword ptr [rbp-10H], r10
						;; bbWeight=1    PerfScore 4.50
G_M63189_IG02:
       movsxd   rdi, edi
       mov      rax, qword ptr [rbp-10H]
       mov      rax, qword ptr [rax+72]
       mov      rax, qword ptr [rax]
						;; bbWeight=1    PerfScore 5.25
G_M63189_IG03:
       call     rax
       mov      ebx, eax
       mov      rax, qword ptr [(reloc)]
       cmp      dword ptr [rax], 0
       jne      SHORT G_M63189_IG06
						;; bbWeight=1    PerfScore 8.25
G_M63189_IG04:
       mov      eax, ebx
						;; bbWeight=1    PerfScore 0.25
G_M63189_IG05:
       lea      rsp, [rbp-08H]
       pop      rbx
       pop      rbp
       ret      
						;; bbWeight=1    PerfScore 2.50
G_M63189_IG06:
       call     [CORINFO_HELP_POLL_GC]
       jmp      SHORT G_M63189_IG04
						;; bbWeight=0    PerfScore 0.00

; Total bytes of code 59, prolog size 12, PerfScore 26.65, instruction count 21 (MethodHash=87cd092a) for method ILStubClass:IL_STUB_PInvoke(int):int
; ============================================================

Unwind Info:
  >> Start offset   : 0x000000 (not in unwind data)
  >>   End offset   : 0xd1ffab1e (not in unwind data)
  Version           : 1
  Flags             : 0x00
  SizeOfProlog      : 0x03
  CountOfUnwindCodes: 3
  FrameRegister     : none (0)
  FrameOffset       : N/A (no FrameRegister) (Value=0)
  UnwindCodes       :
    CodeOffset: 0x03 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 0 * 8 + 8 = 8 = 0x08
    CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
; Assembly listing for method ILStubClass:IL_STUB_PInvoke(int):int
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
; ReadyToRun compilation
; optimized code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )     int  ->  rdi        
;* V01 loc0         [V01    ] (  0,  0   )     int  ->  zero-ref   
;  V02 loc1         [V02,T02] (  2,  2   )    long  ->  rdi        
;* V03 loc2         [V03    ] (  0,  0   )     int  ->  zero-ref   
;  V04 loc3         [V04,T03] (  2,  2   )     int  ->  rbx        
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;  V06 tmp1         [V06    ] (  2,  2   )    long  ->  [rbp-0x10]   do-not-enreg[X] addr-exposed "stub argument"
;  V07 tmp2         [V07,T01] (  2,  4   )    long  ->  rax         "impImportIndirectCall"
;
; Lcl frame size = 8

G_M63189_IG01:
       push     rbp
       push     rbx
       push     rax
       lea      rbp, [rsp+10H]
       mov      qword ptr [rbp-10H], r10
						;; bbWeight=1    PerfScore 4.50
G_M63189_IG02:
       movsxd   rdi, edi
       mov      rax, qword ptr [rbp-10H]
       mov      rax, qword ptr [rax+72]
       mov      rax, qword ptr [rax]
						;; bbWeight=1    PerfScore 5.25
G_M63189_IG03:
       call     rax
       mov      ebx, eax
       mov      rax, qword ptr [(reloc)]
       cmp      dword ptr [rax], 0
       jne      SHORT G_M63189_IG06
						;; bbWeight=1    PerfScore 8.25
G_M63189_IG04:
       mov      eax, ebx
						;; bbWeight=1    PerfScore 0.25
G_M63189_IG05:
       lea      rsp, [rbp-08H]
       pop      rbx
       pop      rbp
       ret      
						;; bbWeight=1    PerfScore 2.50
G_M63189_IG06:
       call     [CORINFO_HELP_POLL_GC]
       jmp      SHORT G_M63189_IG04
						;; bbWeight=0    PerfScore 0.00

; Total bytes of code 59, prolog size 12, PerfScore 26.65, instruction count 21 (MethodHash=87cd092a) for method ILStubClass:IL_STUB_PInvoke(int):int
; ============================================================

Unwind Info:
  >> Start offset   : 0x000000 (not in unwind data)
  >>   End offset   : 0xd1ffab1e (not in unwind data)
  Version           : 1
  Flags             : 0x00
  SizeOfProlog      : 0x03
  CountOfUnwindCodes: 3
  FrameRegister     : none (0)
  FrameOffset       : N/A (no FrameRegister) (Value=0)
  UnwindCodes       :
    CodeOffset: 0x03 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 0 * 8 + 8 = 8 = 0x08
    CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

Updated with this PR
; Assembly listing for method ILStubClass:IL_STUB_PInvoke(int):int
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
; ReadyToRun compilation
; optimized code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )     int  ->  rdi        
;* V01 loc0         [V01    ] (  0,  0   )     int  ->  zero-ref   
;  V02 loc1         [V02,T03] (  2,  2   )    long  ->  rdi        
;* V03 loc2         [V03    ] (  0,  0   )     int  ->  zero-ref   
;  V04 loc3         [V04,T04] (  2,  2   )     int  ->  rax        
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;  V06 tmp1         [V06    ] (  2,  2   )    long  ->  [rbp-0x38]   do-not-enreg[X] addr-exposed "stub argument"
;  V07 tmp2         [V07,T01] (  2,  4   )    long  ->  rbx         "impImportIndirectCall"
;  V08 tmp3         [V08,T02] (  2,  4   )     int  ->  rax         "Single return block return value"
;  V09 PInvokeFrame [V09    ] (  3,  3   )     blk (88) [rbp-0x90]   do-not-enreg[X] addr-exposed "Pinvoke FrameVar"
;  TEMP_01                                    long  ->  [rbp-0x30]
;
; Lcl frame size = 104

G_M63189_IG01:
       push     rbp
       push     r15
       push     r14
       push     r13
       push     r12
       push     rbx
       sub      rsp, 104
       lea      rbp, [rsp+90H]
       mov      qword ptr [rbp-38H], r10
						;; bbWeight=1    PerfScore 7.75
G_M63189_IG02:
       movsxd   rdi, edi
       mov      rax, qword ptr [rbp-38H]
       mov      rax, qword ptr [rax+72]
       mov      rbx, qword ptr [rax]
       mov      qword ptr [rbp-30H], rdi
       lea      rdi, [rbp-90H]
						;; bbWeight=1    PerfScore 6.75
G_M63189_IG03:
       call     [CORINFO_HELP_JIT_PINVOKE_BEGIN]
       mov      rdi, qword ptr [rbp-30H]
						;; bbWeight=1    PerfScore 4.00
G_M63189_IG04:
       call     rbx
       mov      ebx, eax
       lea      rdi, [rbp-90H]
       call     [CORINFO_HELP_JIT_PINVOKE_END]
       mov      eax, ebx
						;; bbWeight=1    PerfScore 7.00
G_M63189_IG05:
       lea      rsp, [rbp-28H]
       pop      rbx
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       pop      rbp
       ret      
						;; bbWeight=1    PerfScore 4.50

; Total bytes of code 95, prolog size 26, PerfScore 39.50, instruction count 30 (MethodHash=87cd092a) for method ILStubClass:IL_STUB_PInvoke(int):int
; ============================================================

Unwind Info:
  >> Start offset   : 0x000000 (not in unwind data)
  >>   End offset   : 0xd1ffab1e (not in unwind data)
  Version           : 1
  Flags             : 0x00
  SizeOfProlog      : 0x0E
  CountOfUnwindCodes: 7
  FrameRegister     : none (0)
  FrameOffset       : N/A (no FrameRegister) (Value=0)
  UnwindCodes       :
    CodeOffset: 0x0E UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 12 * 8 + 8 = 104 = 0x68
    CodeOffset: 0x0A UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
    CodeOffset: 0x09 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r12 (12)
    CodeOffset: 0x07 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r13 (13)
    CodeOffset: 0x05 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r14 (14)
    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r15 (15)
    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
; Assembly listing for method ILStubClass:IL_STUB_PInvoke(int):int
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
; ReadyToRun compilation
; optimized code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )     int  ->  rdi        
;* V01 loc0         [V01    ] (  0,  0   )     int  ->  zero-ref   
;  V02 loc1         [V02,T03] (  2,  2   )    long  ->  rdi        
;* V03 loc2         [V03    ] (  0,  0   )     int  ->  zero-ref   
;  V04 loc3         [V04,T04] (  2,  2   )     int  ->  rax        
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;  V06 tmp1         [V06    ] (  2,  2   )    long  ->  [rbp-0x38]   do-not-enreg[X] addr-exposed "stub argument"
;  V07 tmp2         [V07,T01] (  2,  4   )    long  ->  rbx         "impImportIndirectCall"
;  V08 tmp3         [V08,T02] (  2,  4   )     int  ->  rax         "Single return block return value"
;  V09 PInvokeFrame [V09    ] (  3,  3   )     blk (88) [rbp-0x90]   do-not-enreg[X] addr-exposed "Pinvoke FrameVar"
;  TEMP_01                                    long  ->  [rbp-0x30]
;
; Lcl frame size = 104

G_M63189_IG01:
       push     rbp
       push     r15
       push     r14
       push     r13
       push     r12
       push     rbx
       sub      rsp, 104
       lea      rbp, [rsp+90H]
       mov      qword ptr [rbp-38H], r10
						;; bbWeight=1    PerfScore 7.75
G_M63189_IG02:
       movsxd   rdi, edi
       mov      rax, qword ptr [rbp-38H]
       mov      rax, qword ptr [rax+72]
       mov      rbx, qword ptr [rax]
       mov      qword ptr [rbp-30H], rdi
       lea      rdi, [rbp-90H]
						;; bbWeight=1    PerfScore 6.75
G_M63189_IG03:
       call     [CORINFO_HELP_JIT_PINVOKE_BEGIN]
       mov      rdi, qword ptr [rbp-30H]
						;; bbWeight=1    PerfScore 4.00
G_M63189_IG04:
       call     rbx
       mov      ebx, eax
       lea      rdi, [rbp-90H]
       call     [CORINFO_HELP_JIT_PINVOKE_END]
       mov      eax, ebx
						;; bbWeight=1    PerfScore 7.00
G_M63189_IG05:
       lea      rsp, [rbp-28H]
       pop      rbx
       pop      r12
       pop      r13
       pop      r14
       pop      r15
       pop      rbp
       ret      
						;; bbWeight=1    PerfScore 4.50

; Total bytes of code 95, prolog size 26, PerfScore 39.50, instruction count 30 (MethodHash=87cd092a) for method ILStubClass:IL_STUB_PInvoke(int):int
; ============================================================

Unwind Info:
  >> Start offset   : 0x000000 (not in unwind data)
  >>   End offset   : 0xd1ffab1e (not in unwind data)
  Version           : 1
  Flags             : 0x00
  SizeOfProlog      : 0x0E
  CountOfUnwindCodes: 7
  FrameRegister     : none (0)
  FrameOffset       : N/A (no FrameRegister) (Value=0)
  UnwindCodes       :
    CodeOffset: 0x0E UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 12 * 8 + 8 = 104 = 0x68
    CodeOffset: 0x0A UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
    CodeOffset: 0x09 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r12 (12)
    CodeOffset: 0x07 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r13 (13)
    CodeOffset: 0x05 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r14 (14)
    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r15 (15)
    CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

@AndyAyersMS
Copy link
Member

I'm not sure what the codegen diffs above are supposed to show.

@jkoritzinsky
Copy link
Member Author

The diffs above are the diffs from the IL stubs between 35fbeaf and this PR. I was posting them as they're the only diffs and I believe that they're benign and not the cause of any of the failures we were seeing.

@AndyAyersMS
Copy link
Member

Would be nice to see a method where there are diffs, but the change looks reasonable.

@ghost
Copy link

ghost commented Dec 17, 2020

Hello @jkoritzinsky!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@sandreenko
Copy link
Contributor

I've validated that this PR has no diffs outside of P/Invoke stubs from the JIT in 35fbaef (the commit before #45625).

I am confused here, do you mean that this PR + your merged change #45625 has no diffs against 35fbaef or just this change on top of 35fbaef has no diffs?

@@ -11855,7 +11855,7 @@ void CodeGen::genMultiRegStoreToLocal(GenTreeLclVar* lclNode)
{
if (!lclNode->AsLclVar()->IsLastUse(i))
{
GetEmitter()->emitIns_S_R(ins_Store(srcType), emitTypeSize(srcType), reg, fieldLclNum, 0);
GetEmitter()->emitIns_S_R(ins_Store(srcType), emitTypeSize(destType), reg, fieldLclNum, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not it strange that ins_Store and emitTypeSize have different arguments here?
Also, ins_Store is usually taking destType, not source, it does not matter on x64 but I expect arm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that’s how I got confused and did the wrong thing earlier.

in this case, ins_Store needs to take srcType because we might have a float variable in an int register. For example we hit this case on x86 when calling an unmanaged method that returns a struct of floats and we want to store the floats on the stack, which is what prompted my fix in the first place.

@jkoritzinsky
Copy link
Member Author

I mean my PR + this change has no diffs barring the one I posted (which looks to be related to SuppressGCTransition)

@sandreenko
Copy link
Contributor

Ok, so on which platforms did you see the asm changes? How can I repro them?
Can srcType and dstType have different sizes? What happens in this case with such code ins_Store(srcType), emitTypeSize(destType) on arm64?

@jkoritzinsky
Copy link
Member Author

I saw the failures (that this pr fixed) on Linux x64, arm, and ARM64 IIRC.

@ghost ghost locked as resolved and limited conversation to collaborators Jan 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants