ARM64-SVE: Add `MultiplyAddRotateComplexBySelectedScalar` #105002

amanasifkhalid · 2024-07-17T00:28:16Z

Part of #99957. I had to add a new test template to cover APIs with three vector args, and two immediates; the new template is identical to _SveImmTernOpTestTemplate.template, except that I added additional Imm and InvalidImm arguments. JIT changes are a bit more involved for this API because it is the first ARM64 intrinsic with five operands. This means the HWIntrinsic struct now has an additional GenTree* member. If the TP impact of this is too high, I can get rid of the new member; the code for handling the fifth operand will look a bit less elegant, but it's only for this API, for now.

With @kunalspathak's fix in #104998 merged in locally, all stress tests pass. The SVE encoding unit tests passed, too (I had to tweak the emitter and tests to take the rotation value in its encoded form to match the API-level behavior). Since the jump table generation is a bit odd for this API, here's what the codegen looks like:

G_M19160_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
                                                ;; size=8 bbWeight=1 PerfScore 1.50
G_M19160_IG02:  ;; offset=0x0008
            uxtb    w0, w0
            cmp     w0, #2
            bhs     G_M19160_IG13
            uxtb    w1, w1
            cmp     w1, #4
            bhs     G_M19160_IG13
            lsl     w1, w1, #1
            orr     w0, w0, w1
            adr     x2, [G_M19160_IG03]
            add     x2, x2, x0,  LSL #3
            br      x2
                                                ;; size=44 bbWeight=1 PerfScore 8.00
G_M19160_IG03:  ;; offset=0x0034
            fcmla   z0.s, z1.s, z2.s[0], #0
            b       G_M19160_IG11
                                                ;; size=8 bbWeight=1 PerfScore 5.00
G_M19160_IG04:  ;; offset=0x003C
            fcmla   z0.s, z1.s, z2.s[1], #0
            b       G_M19160_IG11
                                                ;; size=8 bbWeight=1 PerfScore 5.00
G_M19160_IG05:  ;; offset=0x0044
            fcmla   z0.s, z1.s, z2.s[0], #90
            b       G_M19160_IG11
                                                ;; size=8 bbWeight=1 PerfScore 5.00
G_M19160_IG06:  ;; offset=0x004C
            fcmla   z0.s, z1.s, z2.s[1], #90
            b       G_M19160_IG11
                                                ;; size=8 bbWeight=1 PerfScore 5.00
G_M19160_IG07:  ;; offset=0x0054
            fcmla   z0.s, z1.s, z2.s[0], #180
            b       G_M19160_IG11
                                                ;; size=8 bbWeight=1 PerfScore 5.00
G_M19160_IG08:  ;; offset=0x005C
            fcmla   z0.s, z1.s, z2.s[1], #180
            b       G_M19160_IG11
                                                ;; size=8 bbWeight=1 PerfScore 5.00
G_M19160_IG09:  ;; offset=0x0064
            fcmla   z0.s, z1.s, z2.s[0], #270
            b       G_M19160_IG11
                                                ;; size=8 bbWeight=1 PerfScore 5.00
G_M19160_IG10:  ;; offset=0x006C
            fcmla   z0.s, z1.s, z2.s[1], #270
                                                ;; size=4 bbWeight=1 PerfScore 4.00
G_M19160_IG11:  ;; offset=0x0070
            and     w0, w0, #1
            lsr     w1, w1, #1
                                                ;; size=8 bbWeight=1 PerfScore 1.50
G_M19160_IG12:  ;; offset=0x0078
            ldp     fp, lr, [sp], #0x10
            ret     lr
                                                ;; size=8 bbWeight=1 PerfScore 2.00
G_M19160_IG13:  ;; offset=0x0080
            bl      CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
            brk_windows #0
                                                ;; size=8 bbWeight=0 PerfScore 0.00

@dotnet/arm64-contrib PTAL, thanks!

dotnet-issue-labeler · 2024-07-17T00:28:21Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-issue-labeler · 2024-07-17T00:28:23Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-policy-service · 2024-07-17T00:28:50Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/jit/lsraarm64.cpp

kunalspathak · 2024-07-17T01:05:45Z

src/coreclr/jit/hwintrinsiclistarm64sve.h

@@ -190,6 +190,7 @@ HARDWARE_INTRINSIC(Sve,           MinNumberAcross,
 HARDWARE_INTRINSIC(Sve,           Multiply,                                                         -1,      2,      true,  {INS_sve_mul,        INS_sve_mul,        INS_sve_mul,        INS_sve_mul,        INS_sve_mul,        INS_sve_mul,        INS_sve_mul,        INS_sve_mul,        INS_sve_fmul,       INS_sve_fmul},    HW_Category_SIMD,                  HW_Flag_Scalable|HW_Flag_OptionalEmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation)
 HARDWARE_INTRINSIC(Sve,           MultiplyAdd,                                                      -1,     -1,      false, {INS_sve_mla,        INS_sve_mla,        INS_sve_mla,        INS_sve_mla,        INS_sve_mla,        INS_sve_mla,        INS_sve_mla,        INS_sve_mla,        INS_invalid,        INS_invalid},     HW_Category_SIMD,                  HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation|HW_Flag_FmaIntrinsic|HW_Flag_SpecialCodeGen)
 HARDWARE_INTRINSIC(Sve,           MultiplyAddRotateComplex,                                         -1,      -1,     false, {INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_sve_fcmla,      INS_sve_fcmla},   HW_Category_SIMD,                  HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_LowMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_HasImmediateOperand)
+HARDWARE_INTRINSIC(Sve,           MultiplyAddRotateComplexBySelectedScalar,                         -1,      5,      false, {INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_invalid,        INS_sve_fcmla,      INS_invalid},     HW_Category_SIMD,                  HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_LowVectorOperation|HW_Flag_HasRMWSemantics|HW_Flag_SpecialCodeGen|HW_Flag_SpecialImport)


This should have HW_Category_SIMDByIndexedElement.

I agree that category makes sense, though this API seems different enough from other HW_Category_SIMDByIndexedElement APIs that we would have to add several special cases on that category's path (for example, all the 5-operand handling, plus the fact that we index the third vector register in pairs of elements, so the default bounds calculation for the index doesn't work). Do we want to move this logic over?

synced offline...ok to skip this but to add a comment in lsraarm64.cpp explaining why we do not mark it such and why we just need 1 internal register although we have 2 immediates.

kunalspathak

can you please monitor the TP cost and also check the memory cost by adding op5?

amanasifkhalid · 2024-07-17T14:51:21Z

can you please monitor the TP cost and also check the memory cost by adding op5?

Sure: Looks like the TP cost is trivial. Is there a tool we use to measure diffs in memory usage?

kunalspathak · 2024-07-17T17:04:15Z

Is there a tool we use to measure diffs in memory usage?

ignore that. we have updated the struct which gets allocated on the stack and not from arena allocator.

kunalspathak

LGTM. Thanks!

amanasifkhalid · 2024-07-17T21:32:26Z

/ba-g blocking failures are timeouts

Add MultiplyAddRotateComplexBySelectedScalar

50fd17b

amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Jul 17, 2024

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Jul 17, 2024

dotnet-policy-service bot assigned amanasifkhalid Jul 17, 2024

amanasifkhalid mentioned this pull request Jul 17, 2024

Arm64: Implement SVE APIs #99957

Closed

amanasifkhalid commented Jul 17, 2024

View reviewed changes

src/coreclr/jit/lsraarm64.cpp Show resolved Hide resolved

kunalspathak reviewed Jul 17, 2024

View reviewed changes

Comment

9042853

kunalspathak approved these changes Jul 17, 2024

View reviewed changes

Merge from main

7e78652

amanasifkhalid merged commit 1ba2974 into dotnet:main Jul 17, 2024
156 of 167 checks passed

amanasifkhalid deleted the sve-fcmla2 branch July 17, 2024 21:32

github-actions bot locked and limited conversation to collaborators Aug 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64-SVE: Add `MultiplyAddRotateComplexBySelectedScalar` #105002

ARM64-SVE: Add `MultiplyAddRotateComplexBySelectedScalar` #105002

amanasifkhalid commented Jul 17, 2024 •

edited

Loading

dotnet-issue-labeler bot commented Jul 17, 2024

dotnet-issue-labeler bot commented Jul 17, 2024

dotnet-policy-service bot commented Jul 17, 2024

kunalspathak Jul 17, 2024

amanasifkhalid Jul 17, 2024 •

edited

Loading

kunalspathak Jul 17, 2024

kunalspathak left a comment

amanasifkhalid commented Jul 17, 2024

kunalspathak commented Jul 17, 2024

kunalspathak left a comment

amanasifkhalid commented Jul 17, 2024

ARM64-SVE: Add MultiplyAddRotateComplexBySelectedScalar #105002

ARM64-SVE: Add MultiplyAddRotateComplexBySelectedScalar #105002

Conversation

amanasifkhalid commented Jul 17, 2024 • edited Loading

dotnet-issue-labeler bot commented Jul 17, 2024

dotnet-issue-labeler bot commented Jul 17, 2024

dotnet-policy-service bot commented Jul 17, 2024

kunalspathak Jul 17, 2024

Choose a reason for hiding this comment

amanasifkhalid Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

kunalspathak Jul 17, 2024

Choose a reason for hiding this comment

kunalspathak left a comment

Choose a reason for hiding this comment

amanasifkhalid commented Jul 17, 2024

kunalspathak commented Jul 17, 2024

kunalspathak left a comment

Choose a reason for hiding this comment

amanasifkhalid commented Jul 17, 2024

ARM64-SVE: Add `MultiplyAddRotateComplexBySelectedScalar` #105002

ARM64-SVE: Add `MultiplyAddRotateComplexBySelectedScalar` #105002

amanasifkhalid commented Jul 17, 2024 •

edited

Loading

amanasifkhalid Jul 17, 2024 •

edited

Loading