Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT ARM64 SVE: Add FZ_2A, HG_2A, GZ_3A, GV_3A, GY_3*, DV_4A #98310

Merged
merged 11 commits into from
Feb 15, 2024

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Feb 12, 2024

Part of #94549. Implements the following encodings:

  • IF_SVE_FZ_2A
  • IF_SVE_HG_2A (SVE2)
  • IF_SVE_GZ_3A
  • IF_SVE_GV_3A
  • IF_SVE_GY_3B_D (SVE2)
  • IF_SVE_GY_3A (SVE2)
  • IF_SVE_DV_4A (SVE2)

cstool output:

sqcvtn        z0.h, { z2.s, z3.s }
sqcvtun       z6.h, { z14.s, z15.s }
uqcvtn        z14.h, { z30.s, z31.s }
bfmlalb       z0.s, z1.h, z0.h[0]
bfmlalt       z2.s, z3.h, z1.h[1]
bfmlslb       z4.s, z5.h, z2.h[2]
bfmlslt       z6.s, z7.h, z3.h[3]
fmlalb        z8.s, z9.h, z4.h[4]
fmlalt        z10.s, z11.h, z5.h[5]
fmlslb        z12.s, z13.h, z6.h[6]
fmlslt        z14.s, z15.h, z7.h[7]
fcmla z0.s, z1.s, z0.s[0], #0
fcmla z2.s, z3.s, z5.s[1], #90
fcmla z4.s, z5.s, z10.s[0], #180
fcmla z6.s, z7.s, z15.s[1], #270

JitDisasm output:

sqcvtn  z0.h, { z2.s, z3.s }
sqcvtun z6.h, { z14.s, z15.s }
uqcvtn  z14.h, { z30.s, z31.s }
bfmlalb z0.s, z1.h, z0.h[0]
bfmlalt z2.s, z3.h, z1.h[1]
bfmlslb z4.s, z5.h, z2.h[2]
bfmlslt z6.s, z7.h, z3.h[3]
fmlalb  z8.s, z9.h, z4.h[4]
fmlalt  z10.s, z11.h, z5.h[5]
fmlslb  z12.s, z13.h, z6.h[6]
fmlslt  z14.s, z15.h, z7.h[7]
fcmla   z0.s, z1.s, z0.s[0], #0
fcmla   z2.s, z3.s, z5.s[1], #90
fcmla   z4.s, z5.s, z10.s[0], #180
fcmla   z6.s, z7.s, z15.s[1], #270

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 12, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 12, 2024
@amanasifkhalid
Copy link
Member Author

cc @dotnet/arm64-contrib

@ghost ghost assigned amanasifkhalid Feb 12, 2024
@ghost
Copy link

ghost commented Feb 12, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Implements the following encodings:

  • IF_SVE_FZ_2A
  • IF_SVE_HG_2A (SVE2)
  • IF_SVE_GZ_3A
  • IF_SVE_GV_3A
  • IF_SVE_GY_3B_D (SVE2)
  • IF_SVE_GY_3A (SVE2)
  • IF_SVE_DV_4A (SVE2)

cstool output:

sqcvtn        z0.h, { z2.s, z3.s }
sqcvtun       z6.h, { z14.s, z15.s }
uqcvtn        z14.h, { z30.s, z31.s }
bfmlalb       z0.s, z1.h, z0.h[0]
bfmlalt       z2.s, z3.h, z1.h[1]
bfmlslb       z4.s, z5.h, z2.h[2]
bfmlslt       z6.s, z7.h, z3.h[3]
fmlalb        z8.s, z9.h, z4.h[4]
fmlalt        z10.s, z11.h, z5.h[5]
fmlslb        z12.s, z13.h, z6.h[6]
fmlslt        z14.s, z15.h, z7.h[7]
fcmla z0.s, z1.s, z0.s[0], #0
fcmla z2.s, z3.s, z5.s[1], #90
fcmla z4.s, z5.s, z10.s[0], #180
fcmla z6.s, z7.s, z15.s[1], #270

JitDisasm output:

sqcvtn  z0.h, { v2.s, v3.s }, 
sqcvtun z6.h, { v14.s, v15.s }, 
uqcvtn  z14.h, { v30.s, v31.s },
bfmlalb z0.s, z1.h, z0.h[0]
bfmlalt z2.s, z3.h, z1.h[1]
bfmlslb z4.s, z5.h, z2.h[2]
bfmlslt z6.s, z7.h, z3.h[3]
fmlalb  z8.s, z9.h, z4.h[4]
fmlalt  z10.s, z11.h, z5.h[5]
fmlslb  z12.s, z13.h, z6.h[6]
fmlslt  z14.s, z15.h, z7.h[7]
fcmla   z0.s, z1.s, z0.s[0], #0
fcmla   z2.s, z3.s, z5.s[1], #90
fcmla   z4.s, z5.s, z10.s[0], #180
fcmla   z6.s, z7.s, z15.s[1], #270
Author: amanasifkhalid
Assignees: -
Labels:

area-CodeGen-coreclr, arch-arm64-sve

Milestone: -

@@ -12099,7 +12171,7 @@ void emitter::emitIns_R_R_R_I(instruction ins,
break;

case INS_sve_bfmul:
assert(opt = INS_OPTS_SCALABLE_H);
assert(opt == INS_OPTS_SCALABLE_H);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

//------------------------------------------------------------------------
// emitDispVectorRegPair: Display a pair of vector registers
//
void emitter::emitDispVectorRegPair(regNumber reg, insOpts opt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse emitDispSveConsecutiveRegList instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can; updated.

@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
realworld.run.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
realworld.run.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now.

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Hopefully when capstone gets updated this month, we will be able to decode the unsupported ones.

@amanasifkhalid amanasifkhalid merged commit c7253b1 into dotnet:main Feb 15, 2024
129 checks passed
@amanasifkhalid amanasifkhalid deleted the sve-fz-2a branch February 15, 2024 04:24
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants