Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Adding some new functions to System.Math and System.MathF #20788

Merged
merged 5 commits into from
Nov 5, 2018
Merged

Adding some new functions to System.Math and System.MathF #20788

merged 5 commits into from
Nov 5, 2018

Conversation

tannergooding
Copy link
Member

@tannergooding
Copy link
Member Author

Just finishing validating Linux/Mac and hooking up FusedMultiplyAdd as an intrinsic on x86.

return BitConverter.Int32BitsToSingle(bits);
}

public static unsafe float CopySign(float x, float y)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was looking at the codegen for this method:

; Assembly listing for method MathF:CopySign(float,float):float
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T03] (  4,  3.50)   float  ->  mm0        
;  V01 arg1         [V01,T04] (  3,  3   )   float  ->  mm1        
;  V02 loc0         [V02,T00] (  3,  2.50)     int  ->  rax        
;  V03 loc1         [V03,T01] (  2,  2   )     int  ->  rdx        
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;  V05 tmp1         [V05,T05] (  2,  4   )   float  ->  [rsp+0x14]   do-not-enreg[F] ld-addr-op "Inlining Arg"
;  V06 tmp2         [V06,T06] (  2,  4   )   float  ->  [rsp+0x10]   do-not-enreg[F] ld-addr-op "Inlining Arg"
;  V07 tmp3         [V07,T02] (  2,  2   )     int  ->  [rsp+0x0C]   do-not-enreg[F] ld-addr-op "Inlining Arg"
;
; Lcl frame size = 24

G_M7953_IG01:
       sub      rsp, 24
       vzeroupper 

G_M7953_IG02:
       vmovss   dword ptr [rsp+14H], xmm0
       mov      eax, dword ptr [rsp+14H]
       vmovss   dword ptr [rsp+10H], xmm1
       mov      edx, dword ptr [rsp+10H]
       xor      edx, eax
       test     edx, edx
       jge      SHORT G_M7953_IG04
       xor      eax, 0xD1FFAB1E
       mov      dword ptr [rsp+0CH], eax
       vmovss   xmm0, dword ptr [rsp+0CH]

G_M7953_IG03:
       add      rsp, 24
       ret      

G_M7953_IG04:
       add      rsp, 24
       ret      

; Total bytes of code 61, prolog size 7 for method MathF:CopySign(float,float):float
; ============================================================

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following, which is produced from BitConverter.SingleToInt32Bits

vmovss   dword ptr [rsp+14H], xmm0
mov      eax, dword ptr [rsp+14H]

Could be:

movd eax, xmm0

Copy link
Member Author

@tannergooding tannergooding Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following, which is produced by if ((xbits ^ ybits) < 0):

xor      edx, eax
test     edx, edx
jge      SHORT G_M7953_IG04

Could be:

xor edx, eax
jns SHORT G_M7953_IG04

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G_M7953_IG03: and G_M7953_IG04: are the same, and could be folded together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following, which is produced from BitConverter.Int32BitsToSingle

mov      dword ptr [rsp+0CH], eax
vmovss   xmm0, dword ptr [rsp+0CH]

Could be:

vmovd xmm0, eax

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we should be generating something closer to the following, if possible:

G_M7953_IG01:
       vmovd    eax, xmm0
       vmovd    edx, xmm1
       xor      edx, eax
       jns      SHORT G_M7953_IG02
       xor      eax, 0xD1FFAB1E
       vmovd    xmm0, eax

G_M7953_IG02:
       ret

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it specifies there is no transition penalty when using the AVX encoded 128-bit instructions,

Right, there is no transition penalty in VEX encoding, but we insert vzeroupper at the prolog to avoid SSE-to-AVX transition penalty from P/Invoke or corssgen-ed code calling JITed code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the prolog to avoid SSE-to-AVX transition penalty from P/Invoke or corssgen-ed code calling JITed code.

The architecture optimization manual seems to imply that there is no transition penalty when going from legacy 128-bit instructions to VEX-encoded 128-bit instructions (or vice-versa). Only when going from legacy 128-bit instructions to VEX-encoded 256-bit instructions (and that you avoid the penalty by using VZEROUPPER) and when going from VEX-encoded 256-bit instructions to legacy 128-bit instructions (where you cannot avoid the penalty). -- See the last line in the table, for Mixed Code Handling (and some surrounding text in the optimization manual)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the optimization manual is confusing here. Yes, there is no transition penalty when going from legacy 128-bit instructions to VEX-encoded 128-bit instructions on Skylake (and newer) micro-architectures.
image

But the penalty still can happen on older architectures
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. There is a penalty on Broadwell and prior when going from Preserved Non-INIT Upper State to Dirty Upper State or Clean Upper State. However, it looks like Penalty D and Penalty B are approx the same (whether it is implicit to dirty, or explicit to clean via vzeroupper). The difference is whether there is an additional penalty when transitioning back into the legacy instructions (which will impact crossgen code)...

image
image
image

@tannergooding tannergooding changed the title [WIP] Adding some new functions to System.Math and System.MathF Adding some new functions to System.Math and System.MathF Nov 5, 2018
@tannergooding
Copy link
Member Author

This still doesn't treat FusedMultiplyAdd as an intrinsic, but I plan on doing that in a follow-up PR after we have some CoreFX tests (including benchmarks) enabled.

@tannergooding
Copy link
Member Author

CC. @AndyAyersMS, @eerhardt who have reviewed the past Math API additions. Also CC. @ViktorHofer who is a System.Numerics area owner.

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a new issue for the codegen stuff you noticed?

Changes look good.

@tannergooding
Copy link
Member Author

Thanks. I've logged https://github.com/dotnet/coreclr/issues/20820 and https://github.com/dotnet/coreclr/issues/20821 (the vzeroupper case is already tracked by another issue).

@tannergooding tannergooding merged commit 2841758 into dotnet:master Nov 5, 2018
@briansull
Copy link

CodegenMirror fails to build on the Desktop.
I am looking into fix this.

tannergooding added a commit that referenced this pull request Nov 12, 2018
* Fixing Max, MaxMagnitude, Min, and MinMagnitude for Math/MathF to be IEEE compliant

* Disabling the System.Math.Max/Min tests

* Adding the new c_runtime PAL tests to the CMakeLists to ensure they actually get run.

* Fixing the casing of IlogB to ILogB

* Fixing the new PAL tests to match the correct/expected outputs

* Fixing up PAL_ilogb to correctly handle 0 and NaN
A-And pushed a commit to A-And/coreclr that referenced this pull request Nov 20, 2018
* Adding BitIncrement, BitDecrement, CopySign, MaxMagnitude, and MinMagnitude to Math and MathF

* Adding FusedMultiplyAdd, IlogB, Log2, and ScaleB to Math and MathF

* Adding some basic PAL tests for fma, ilogb, log2, and scalbn

* Fixing a couple typos and adding clarifying comments

* Fixing the MSVC _VVV FCALL declarations
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…eclr#20788)

* Adding BitIncrement, BitDecrement, CopySign, MaxMagnitude, and MinMagnitude to Math and MathF

* Adding FusedMultiplyAdd, IlogB, Log2, and ScaleB to Math and MathF

* Adding some basic PAL tests for fma, ilogb, log2, and scalbn

* Fixing a couple typos and adding clarifying comments

* Fixing the MSVC _VVV FCALL declarations


Commit migrated from dotnet/coreclr@2841758
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…coreclr#20912)

* Fixing Max, MaxMagnitude, Min, and MinMagnitude for Math/MathF to be IEEE compliant

* Disabling the System.Math.Max/Min tests

* Adding the new c_runtime PAL tests to the CMakeLists to ensure they actually get run.

* Fixing the casing of IlogB to ILogB

* Fixing the new PAL tests to match the correct/expected outputs

* Fixing up PAL_ilogb to correctly handle 0 and NaN


Commit migrated from dotnet/coreclr@a49296e
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Expose some additional Math and MathF operations
6 participants