JIT ARM64 SVE: Add FZ_2A, HG_2A, GZ_3A, GV_3A, GY_3*, DV_4A #98310

amanasifkhalid · 2024-02-12T17:52:30Z

Part of #94549. Implements the following encodings:

IF_SVE_FZ_2A
IF_SVE_HG_2A (SVE2)
IF_SVE_GZ_3A
IF_SVE_GV_3A
IF_SVE_GY_3B_D (SVE2)
IF_SVE_GY_3A (SVE2)
IF_SVE_DV_4A (SVE2)

cstool output:

sqcvtn        z0.h, { z2.s, z3.s }
sqcvtun       z6.h, { z14.s, z15.s }
uqcvtn        z14.h, { z30.s, z31.s }
bfmlalb       z0.s, z1.h, z0.h[0]
bfmlalt       z2.s, z3.h, z1.h[1]
bfmlslb       z4.s, z5.h, z2.h[2]
bfmlslt       z6.s, z7.h, z3.h[3]
fmlalb        z8.s, z9.h, z4.h[4]
fmlalt        z10.s, z11.h, z5.h[5]
fmlslb        z12.s, z13.h, z6.h[6]
fmlslt        z14.s, z15.h, z7.h[7]
fcmla z0.s, z1.s, z0.s[0], #0
fcmla z2.s, z3.s, z5.s[1], #90
fcmla z4.s, z5.s, z10.s[0], #180
fcmla z6.s, z7.s, z15.s[1], #270

JitDisasm output:

sqcvtn  z0.h, { z2.s, z3.s }
sqcvtun z6.h, { z14.s, z15.s }
uqcvtn  z14.h, { z30.s, z31.s }
bfmlalb z0.s, z1.h, z0.h[0]
bfmlalt z2.s, z3.h, z1.h[1]
bfmlslb z4.s, z5.h, z2.h[2]
bfmlslt z6.s, z7.h, z3.h[3]
fmlalb  z8.s, z9.h, z4.h[4]
fmlalt  z10.s, z11.h, z5.h[5]
fmlslb  z12.s, z13.h, z6.h[6]
fmlslt  z14.s, z15.h, z7.h[7]
fcmla   z0.s, z1.s, z0.s[0], #0
fcmla   z2.s, z3.s, z5.s[1], #90
fcmla   z4.s, z5.s, z10.s[0], #180
fcmla   z6.s, z7.s, z15.s[1], #270

amanasifkhalid · 2024-02-12T17:53:08Z

cc @dotnet/arm64-contrib

ghost · 2024-02-12T18:05:58Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Implements the following encodings:

IF_SVE_FZ_2A
IF_SVE_HG_2A (SVE2)
IF_SVE_GZ_3A
IF_SVE_GV_3A
IF_SVE_GY_3B_D (SVE2)
IF_SVE_GY_3A (SVE2)
IF_SVE_DV_4A (SVE2)

cstool output:

sqcvtn        z0.h, { z2.s, z3.s }
sqcvtun       z6.h, { z14.s, z15.s }
uqcvtn        z14.h, { z30.s, z31.s }
bfmlalb       z0.s, z1.h, z0.h[0]
bfmlalt       z2.s, z3.h, z1.h[1]
bfmlslb       z4.s, z5.h, z2.h[2]
bfmlslt       z6.s, z7.h, z3.h[3]
fmlalb        z8.s, z9.h, z4.h[4]
fmlalt        z10.s, z11.h, z5.h[5]
fmlslb        z12.s, z13.h, z6.h[6]
fmlslt        z14.s, z15.h, z7.h[7]
fcmla z0.s, z1.s, z0.s[0], #0
fcmla z2.s, z3.s, z5.s[1], #90
fcmla z4.s, z5.s, z10.s[0], #180
fcmla z6.s, z7.s, z15.s[1], #270

JitDisasm output:

sqcvtn  z0.h, { v2.s, v3.s }, 
sqcvtun z6.h, { v14.s, v15.s }, 
uqcvtn  z14.h, { v30.s, v31.s },
bfmlalb z0.s, z1.h, z0.h[0]
bfmlalt z2.s, z3.h, z1.h[1]
bfmlslb z4.s, z5.h, z2.h[2]
bfmlslt z6.s, z7.h, z3.h[3]
fmlalb  z8.s, z9.h, z4.h[4]
fmlalt  z10.s, z11.h, z5.h[5]
fmlslb  z12.s, z13.h, z6.h[6]
fmlslt  z14.s, z15.h, z7.h[7]
fcmla   z0.s, z1.s, z0.s[0], #0
fcmla   z2.s, z3.s, z5.s[1], #90
fcmla   z4.s, z5.s, z10.s[0], #180
fcmla   z6.s, z7.s, z15.s[1], #270

Author:	amanasifkhalid
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `arch-arm64-sve`
Milestone:	-

kunalspathak · 2024-02-12T19:29:05Z

src/coreclr/jit/emitarm64.cpp

@@ -12099,7 +12171,7 @@ void emitter::emitIns_R_R_R_I(instruction ins,
            break;

        case INS_sve_bfmul:
-            assert(opt = INS_OPTS_SCALABLE_H);
+            assert(opt == INS_OPTS_SCALABLE_H);


kunalspathak · 2024-02-12T19:32:54Z

src/coreclr/jit/emitarm64.cpp

+//------------------------------------------------------------------------
+// emitDispVectorRegPair: Display a pair of vector registers
+//
+void emitter::emitDispVectorRegPair(regNumber reg, insOpts opt)


can we reuse emitDispSveConsecutiveRegList instead?

Yes we can; updated.

ryujit-bot · 2024-02-12T20:10:22Z

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	-0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.windows.arm64.checked.mch	+0.01%

Details here

ryujit-bot · 2024-02-12T23:11:12Z

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	+0.01%

Details here

ryujit-bot · 2024-02-13T00:11:23Z

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	+0.01%

Details here

src/coreclr/jit/emitarm64.cpp

ryujit-bot · 2024-02-13T23:14:28Z

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	+0.01%
realworld.run.linux.arm64.checked.mch	+0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
realworld.run.osx.arm64.checked.mch	-0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.windows.arm64.checked.mch	+0.01%

Details here

ryujit-bot · 2024-02-14T00:14:44Z

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.linux.arm64.checked.mch	+0.01%
realworld.run.linux.arm64.checked.mch	+0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)

Collection	PDIFF
realworld.run.osx.arm64.checked.mch	-0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.pmi.windows.arm64.checked.mch	+0.01%

Details here

a74nh

I'm happy with this now.

TIHan

LGTM

Hopefully when capstone gets updated this month, we will be able to decode the unsupported ones.

amanasifkhalid added 7 commits February 11, 2024 23:29

Add IF_SVE_FZ_2A

1b63a30

Add IF_SVE_HG_2A

3319d1e

Add IF_SVE_GZ_3A

a1add79

Add IF_SVE_GV_3A

8540378

Add IF_SVE_GY_3B_D

8c84d84

Add IF_SVE_GY_3A

f15b779

Add IF_SVE_DV_4A

1c5cf57

amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 12, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 12, 2024

amanasifkhalid mentioned this pull request Feb 12, 2024

Arm64: Implement SVE encodings #94549

Closed

Add unreached() for SVE2 encodings

c360ca9

ghost assigned amanasifkhalid Feb 12, 2024

kunalspathak reviewed Feb 12, 2024

View reviewed changes

Remove emitDispVectorRegPair

a5c8bb1

a74nh reviewed Feb 13, 2024

View reviewed changes

src/coreclr/jit/emitarm64.cpp Show resolved Hide resolved

amanasifkhalid added 2 commits February 13, 2024 16:31

Merge from main

0a1307d

Use insGetPredicateType

d3a07c6

amanasifkhalid requested review from kunalspathak and a74nh February 14, 2024 15:55

a74nh approved these changes Feb 14, 2024

View reviewed changes

TIHan approved these changes Feb 15, 2024

View reviewed changes

amanasifkhalid merged commit c7253b1 into dotnet:main Feb 15, 2024
129 checks passed

amanasifkhalid deleted the sve-fz-2a branch February 15, 2024 04:24

github-actions bot locked and limited conversation to collaborators Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT ARM64 SVE: Add FZ_2A, HG_2A, GZ_3A, GV_3A, GY_3*, DV_4A #98310

JIT ARM64 SVE: Add FZ_2A, HG_2A, GZ_3A, GV_3A, GY_3*, DV_4A #98310

amanasifkhalid commented Feb 12, 2024 •

edited

Loading

amanasifkhalid commented Feb 12, 2024

ghost commented Feb 12, 2024

kunalspathak Feb 12, 2024

kunalspathak Feb 12, 2024

amanasifkhalid Feb 12, 2024

ryujit-bot commented Feb 12, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

ryujit-bot commented Feb 12, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

ryujit-bot commented Feb 13, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

ryujit-bot commented Feb 13, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

ryujit-bot commented Feb 14, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

a74nh left a comment

TIHan left a comment

JIT ARM64 SVE: Add FZ_2A, HG_2A, GZ_3A, GV_3A, GY_3*, DV_4A #98310

JIT ARM64 SVE: Add FZ_2A, HG_2A, GZ_3A, GV_3A, GY_3*, DV_4A #98310

Conversation

amanasifkhalid commented Feb 12, 2024 • edited Loading

amanasifkhalid commented Feb 12, 2024

ghost commented Feb 12, 2024

kunalspathak Feb 12, 2024

Choose a reason for hiding this comment

kunalspathak Feb 12, 2024

Choose a reason for hiding this comment

amanasifkhalid Feb 12, 2024

Choose a reason for hiding this comment

ryujit-bot commented Feb 12, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

ryujit-bot commented Feb 12, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

ryujit-bot commented Feb 13, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

ryujit-bot commented Feb 13, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

ryujit-bot commented Feb 14, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

a74nh left a comment

Choose a reason for hiding this comment

TIHan left a comment

Choose a reason for hiding this comment

amanasifkhalid commented Feb 12, 2024 •

edited

Loading