-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Adding some new functions to System.Math and System.MathF #20788
Conversation
Just finishing validating Linux/Mac and hooking up |
return BitConverter.Int32BitsToSingle(bits); | ||
} | ||
|
||
public static unsafe float CopySign(float x, float y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was looking at the codegen for this method:
; Assembly listing for method MathF:CopySign(float,float):float
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00,T03] ( 4, 3.50) float -> mm0
; V01 arg1 [V01,T04] ( 3, 3 ) float -> mm1
; V02 loc0 [V02,T00] ( 3, 2.50) int -> rax
; V03 loc1 [V03,T01] ( 2, 2 ) int -> rdx
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
; V05 tmp1 [V05,T05] ( 2, 4 ) float -> [rsp+0x14] do-not-enreg[F] ld-addr-op "Inlining Arg"
; V06 tmp2 [V06,T06] ( 2, 4 ) float -> [rsp+0x10] do-not-enreg[F] ld-addr-op "Inlining Arg"
; V07 tmp3 [V07,T02] ( 2, 2 ) int -> [rsp+0x0C] do-not-enreg[F] ld-addr-op "Inlining Arg"
;
; Lcl frame size = 24
G_M7953_IG01:
sub rsp, 24
vzeroupper
G_M7953_IG02:
vmovss dword ptr [rsp+14H], xmm0
mov eax, dword ptr [rsp+14H]
vmovss dword ptr [rsp+10H], xmm1
mov edx, dword ptr [rsp+10H]
xor edx, eax
test edx, edx
jge SHORT G_M7953_IG04
xor eax, 0xD1FFAB1E
mov dword ptr [rsp+0CH], eax
vmovss xmm0, dword ptr [rsp+0CH]
G_M7953_IG03:
add rsp, 24
ret
G_M7953_IG04:
add rsp, 24
ret
; Total bytes of code 61, prolog size 7 for method MathF:CopySign(float,float):float
; ============================================================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following, which is produced from BitConverter.SingleToInt32Bits
vmovss dword ptr [rsp+14H], xmm0
mov eax, dword ptr [rsp+14H]
Could be:
movd eax, xmm0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following, which is produced by if ((xbits ^ ybits) < 0)
:
xor edx, eax
test edx, edx
jge SHORT G_M7953_IG04
Could be:
xor edx, eax
jns SHORT G_M7953_IG04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G_M7953_IG03:
and G_M7953_IG04:
are the same, and could be folded together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following, which is produced from BitConverter.Int32BitsToSingle
mov dword ptr [rsp+0CH], eax
vmovss xmm0, dword ptr [rsp+0CH]
Could be:
vmovd xmm0, eax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we should be generating something closer to the following, if possible:
G_M7953_IG01:
vmovd eax, xmm0
vmovd edx, xmm1
xor edx, eax
jns SHORT G_M7953_IG02
xor eax, 0xD1FFAB1E
vmovd xmm0, eax
G_M7953_IG02:
ret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as it specifies there is no transition penalty when using the AVX encoded 128-bit instructions,
Right, there is no transition penalty in VEX encoding, but we insert vzeroupper
at the prolog to avoid SSE-to-AVX transition penalty from P/Invoke or corssgen-ed code calling JITed code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the prolog to avoid SSE-to-AVX transition penalty from P/Invoke or corssgen-ed code calling JITed code.
The architecture optimization manual seems to imply that there is no transition penalty when going from legacy 128-bit instructions to VEX-encoded 128-bit instructions (or vice-versa). Only when going from legacy 128-bit instructions to VEX-encoded 256-bit instructions (and that you avoid the penalty by using VZEROUPPER
) and when going from VEX-encoded 256-bit instructions to legacy 128-bit instructions (where you cannot avoid the penalty). -- See the last line in the table, for Mixed Code Handling
(and some surrounding text in the optimization manual)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. There is a penalty on Broadwell and prior when going from Preserved Non-INIT Upper State
to Dirty Upper State
or Clean Upper State
. However, it looks like Penalty D
and Penalty B
are approx the same (whether it is implicit to dirty, or explicit to clean
via vzeroupper
). The difference is whether there is an additional penalty when transitioning back into the legacy instructions (which will impact crossgen code)...
This still doesn't treat |
CC. @AndyAyersMS, @eerhardt who have reviewed the past Math API additions. Also CC. @ViktorHofer who is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open a new issue for the codegen stuff you noticed?
Changes look good.
Thanks. I've logged https://github.com/dotnet/coreclr/issues/20820 and https://github.com/dotnet/coreclr/issues/20821 (the vzeroupper case is already tracked by another issue). |
CodegenMirror fails to build on the Desktop. |
* Fixing Max, MaxMagnitude, Min, and MinMagnitude for Math/MathF to be IEEE compliant * Disabling the System.Math.Max/Min tests * Adding the new c_runtime PAL tests to the CMakeLists to ensure they actually get run. * Fixing the casing of IlogB to ILogB * Fixing the new PAL tests to match the correct/expected outputs * Fixing up PAL_ilogb to correctly handle 0 and NaN
* Adding BitIncrement, BitDecrement, CopySign, MaxMagnitude, and MinMagnitude to Math and MathF * Adding FusedMultiplyAdd, IlogB, Log2, and ScaleB to Math and MathF * Adding some basic PAL tests for fma, ilogb, log2, and scalbn * Fixing a couple typos and adding clarifying comments * Fixing the MSVC _VVV FCALL declarations
…eclr#20788) * Adding BitIncrement, BitDecrement, CopySign, MaxMagnitude, and MinMagnitude to Math and MathF * Adding FusedMultiplyAdd, IlogB, Log2, and ScaleB to Math and MathF * Adding some basic PAL tests for fma, ilogb, log2, and scalbn * Fixing a couple typos and adding clarifying comments * Fixing the MSVC _VVV FCALL declarations Commit migrated from dotnet/coreclr@2841758
…coreclr#20912) * Fixing Max, MaxMagnitude, Min, and MinMagnitude for Math/MathF to be IEEE compliant * Disabling the System.Math.Max/Min tests * Adding the new c_runtime PAL tests to the CMakeLists to ensure they actually get run. * Fixing the casing of IlogB to ILogB * Fixing the new PAL tests to match the correct/expected outputs * Fixing up PAL_ilogb to correctly handle 0 and NaN Commit migrated from dotnet/coreclr@a49296e
This resolves: https://github.com/dotnet/corefx/issues/31903