Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] MultiplyHigh #43106

Closed
Tracked by #43051
echesakov opened this issue Oct 6, 2020 · 2 comments · Fixed by #47362 or mono/mono#20792
Closed
Tracked by #43051

[Arm64] MultiplyHigh #43106

echesakov opened this issue Oct 6, 2020 · 2 comments · Fixed by #47362 or mono/mono#20792
Assignees
Labels
api-approved API was approved in API review, it can be implemented arch-arm64 area-System.Runtime.Intrinsics
Milestone

Comments

@echesakov
Copy link
Contributor

Exposing smulh/umulh as intrinsics on Arm64

class ArmBase.Arm64
{
  /// <summary>
  ///   A64: SMULH Xd, Xn, Xm
  /// </summary>
  public static long MultiplyHigh(long left, long right);

  /// <summary>
  ///   A64: UMULH Xd, Xn, Xm
  /// </summary>
  public static ulong MultiplyHigh(ulong left, ulong right);
}

would allow to implement System.Math.BigMul as

  low = a * b;
  return ArmBase.Arm64.MultiplyHigh(a, b);

cc @CarolEidt @tannergooding @TamarChristinaArm

@echesakov echesakov added arch-arm64 area-System.Runtime.Intrinsics api-ready-for-review API is ready for review, it is NOT ready for implementation labels Oct 6, 2020
@echesakov echesakov added this to the 6.0.0 milestone Oct 6, 2020
@ghost
Copy link

ghost commented Oct 6, 2020

Tagging subscribers to this area: @tannergooding, @jeffhandley
See info in area-owners.md if you want to be subscribed.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Oct 6, 2020
@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Oct 6, 2020
@terrajobst
Copy link
Member

terrajobst commented Oct 20, 2020

Video

Looks good as proposed:

namespace System.Runtime.Intrinsics.Arm
{
    public abstract class ArmBase
    {
        public abstract class Arm64
        {
            public static long MultiplyHigh(long left, long right);
            public static ulong MultiplyHigh(ulong left, ulong right);
        }
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Oct 20, 2020
@echesakov echesakov self-assigned this Oct 20, 2020
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 23, 2021
monojenkins pushed a commit to monojenkins/mono that referenced this issue Jan 27, 2021
Closes dotnet/runtime#43106

In addition to implementing the intrinsics I have updated `System.Math:BigMul(long,long,byref):long` implementation in System.Private.CoreLib. The following is the codegen of the methods:
```asm
; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9BC17C00          umulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],mono#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================

; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2
;* V03 loc0         [V03    ] (  0,  0   )    long  ->  zero-ref
;* V04 loc1         [V04    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9B417C00          smulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],mono#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================
```
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 28, 2021
imhameed pushed a commit to mono/mono that referenced this issue Jan 28, 2021
Closes dotnet/runtime#43106

In addition to implementing the intrinsics I have updated `System.Math:BigMul(long,long,byref):long` implementation in System.Private.CoreLib. The following is the codegen of the methods:
```asm
; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9BC17C00          umulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================

; Assembly listing for method System.Math:BigMul(long,long,byref):long
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; ReadyToRun compilation
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  4,  4   )    long  ->   x0
;  V01 arg1         [V01,T01] (  4,  4   )    long  ->   x1
;  V02 arg2         [V02,T02] (  3,  3   )   byref  ->   x2
;* V03 loc0         [V03    ] (  0,  0   )    long  ->  zero-ref
;* V04 loc1         [V04    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op
;# V05 OutArgs      [V05    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M18264_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M18264_IG02:              ;; offset=0008H
        9B017C03          mul     x3, x0, x1
        F9000043          str     x3, [x2]
        9B417C00          smulh   x0, x0, x1
						;; bbWeight=1    PerfScore 8.00
G_M18264_IG03:              ;; offset=0014H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 14.30, instruction count 7, allocated bytes for code 28 (MethodHash=96edb8a7) for method System.Math:BigMul(long,long,byref):long
; ============================================================
```

Co-authored-by: echesakovMSFT <[email protected]>
@ghost ghost locked as resolved and limited conversation to collaborators Feb 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented arch-arm64 area-System.Runtime.Intrinsics
Projects
None yet
4 participants