-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsafe.As resulting in lots of unnecessary moves. #55357
Comments
Tagging subscribers to this area: @dotnet/area-system-threading-tasks Issue DetailsDescriptionUsing Unsafe.As on a struct results in a lot of unnecessary asm move instructions, especially when on method return and causes even more of them after inlining. ConfigurationSharplab Core CLR v5.0.721.25508 Regression?No idea Data
|
The area is wrong, should be |
We still don't have a good way to represent a struct cast without taking its address, it is a known problem that I was planning to fix for 6.0 but did not have time to finish. There is no easy fix, but we are aware of the issue and will improve this scenario in 7.0. |
Moving to .NET 8 as we don't have enough time to work on this. |
This one actually might've been fixed with #68739. @MichalPetryka is that something you'd be willing to validate looks fixed on your end? |
This issue has been marked |
Pining @MichalPetryka. |
Checked with RC2, the issue is still there @tannergooding. |
That is likely not due to |
With .Net 8 and Unsafe.BitCast, the codegen is now:
Just 2 redundant moves from a stack spill left. |
#85562 gives the desired codegen here with EDIT: actually seems like it's still not ideal on Linux due to SysV ABI: ; Method Test:A(Test+UInt128):Test+UInt128
G_M20876_IG01: ;; offset=0000H
sub rsp, 40
vzeroupper
mov qword ptr [rsp+18H], rdi
mov qword ptr [rsp+20H], rsi
;; size=17 bbWeight=1 PerfScore 3.25
G_M20876_IG02: ;; offset=0011H
vmovups xmm0, xmmword ptr [rsp+18H]
vpslldq xmm0, xmm0, 1
vmovaps xmmword ptr [rsp], xmm0
mov rax, qword ptr [rsp]
mov rdx, qword ptr [rsp+08H]
;; size=25 bbWeight=1 PerfScore 7.00
G_M20876_IG03: ;; offset=002AH
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 47 |
@MichalPetryka What is the impact of #85562 expected to be here? I see no diffs in ; Assembly listing for method Test:A(System.UInt128):System.UInt128 (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 0 single block inlinees; 3 inlinees without PGO data
; Final local variable assignments
;
; V00 RetBuf [V00,T01] ( 4, 4 ) byref -> rcx single-def
; V01 arg0 [V01,T00] ( 3, 6 ) byref -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+00H] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V03 tmp1 [V03 ] ( 0, 0 ) struct (16) zero-ref "Inline return value spill temp" <System.UInt128>
;* V04 tmp2 [V04 ] ( 0, 0 ) simd16 -> zero-ref "spilled call-like call argument"
; V05 tmp3 [V05 ] ( 2, 4 ) struct (16) [rsp+08H] do-not-enreg[SF] ld-addr-op "Inlining Arg" <System.UInt128>
; V06 tmp4 [V06,T04] ( 2, 4 ) simd16 -> mm0 ld-addr-op "Inlining Arg" <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;* V07 tmp5 [V07 ] ( 0, 0 ) long -> zero-ref "field V01._lower (fldOffset=0x0)" P-INDEP
;* V08 tmp6 [V08 ] ( 0, 0 ) long -> zero-ref "field V01._upper (fldOffset=0x8)" P-INDEP
;* V09 tmp7 [V09 ] ( 0, 0 ) long -> zero-ref "field V03._lower (fldOffset=0x0)" P-INDEP
;* V10 tmp8 [V10 ] ( 0, 0 ) long -> zero-ref "field V03._upper (fldOffset=0x8)" P-INDEP
; V11 tmp9 [V11,T02] ( 2, 4 ) long -> [rsp+08H] do-not-enreg[] "field V05._lower (fldOffset=0x0)" P-DEP
; V12 tmp10 [V12,T03] ( 2, 4 ) long -> [rsp+10H] do-not-enreg[] "field V05._upper (fldOffset=0x8)" P-DEP
;* V13 tmp11 [V13 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref" <System.UInt128>
;
; Lcl frame size = 24
; BEGIN METHOD Test:A(System.UInt128):System.UInt128
G_M34252_IG01: ;; offset=0000H
sub rsp, 24
vzeroupper
;; size=7 bbWeight=1 PerfScore 1.25
G_M34252_IG02: ;; offset=0007H
vmovups xmm0, xmmword ptr [rdx]
vmovups xmmword ptr [rsp+08H], xmm0
vmovups xmm0, xmmword ptr [rsp+08H]
vpslldq xmm0, xmm0, 1
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
;; size=28 bbWeight=1 PerfScore 11.25
G_M34252_IG03: ;; offset=0023H
add rsp, 24
ret
;; size=5 bbWeight=1 PerfScore 1.25
; END METHOD Test:A(System.UInt128):System.UInt128
; Total bytes of code 40, prolog size 7, PerfScore 17.75, instruction count 10, allocated bytes for code 40 (MethodHash=4c3b7a33) for method Test:A(System.UInt128):System.UInt128 (FullOpts)
; ============================================================ with and without that change. |
Ah hang on, user error :-) I see it now. |
The reason the original ; Assembly listing for method Test:A(System.UInt128):System.UInt128 (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 0 single block inlinees; 1 inlinees without PGO data
; Final local variable assignments
;
; V00 RetBuf [V00,T01] ( 4, 4 ) byref -> rcx single-def
; V01 arg0 [V01,T00] ( 3, 6 ) byref -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+00H] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V03 tmp1 [V03 ] ( 0, 0 ) struct (16) zero-ref "Inline return value spill temp" <System.UInt128>
;* V04 tmp2 [V04 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SF] ld-addr-op "Inlining Arg" <System.UInt128>
; V05 tmp3 [V05,T02] ( 2, 2 ) simd16 -> mm0 ld-addr-op "Inline stloc first use temp" <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;* V06 tmp4 [V06 ] ( 0, 0 ) simd16 -> zero-ref "V04.[000..016)"
;
; Lcl frame size = 0
; BEGIN METHOD Test:A(System.UInt128):System.UInt128
G_M34252_IG01: ;; offset=0000H
vzeroupper
;; size=3 bbWeight=1 PerfScore 1.00
G_M34252_IG02: ;; offset=0003H
vmovups xmm0, xmmword ptr [rdx]
vpslldq xmm0, xmm0, 1
vmovups xmmword ptr [rcx], xmm0
mov rax, rcx
;; size=16 bbWeight=1 PerfScore 7.25
G_M34252_IG03: ;; offset=0013H
ret
;; size=1 bbWeight=1 PerfScore 1.00
; END METHOD Test:A(System.UInt128):System.UInt128
; Total bytes of code 20, prolog size 3, PerfScore 11.25, instruction count 6, allocated bytes for code 20 (MethodHash=4c3b7a33) for method Test:A(System.UInt128):System.UInt128 (FullOpts)
; ============================================================ Hopefully we can improve this more generally in .NET 9 as part of reducing the scope of dependent promotion. |
Description
Using Unsafe.As on a struct results in a lot of unnecessary asm move instructions, especially when on method return and causes even more of them after inlining.
Configuration
Sharplab Core CLR v5.0.721.25508
Regression?
No idea
Data
Sharplab
Output:
category:cq
theme:structs
skill-level:expert
cost:medium
impact:medium
The text was updated successfully, but these errors were encountered: