Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEON: implement all intrinsics supported by architecture A64-remaining part #1093

Merged
merged 109 commits into from
Nov 16, 2023

Conversation

yyctw
Copy link
Contributor

@yyctw yyctw commented Oct 26, 2023

Hi all, this is Eric from Andes Technology Corporation. This PR is the remaining part of the previous PR and includes the following:

Implement all poly-related types using uint.
Implement all functions related to the poly type (with test cases).
Implement all functions related to the bf16 type (without test cases).
Add 1035 initial implementations and corresponding test cases in 137 families which are listed below:
add, aes, bsl, ceq, ceqz, cmla, cmla_rot180, cmla_rot270, cmla_rot90, cnt, combine, copy_lane, crc32, create, cvt, div, dot, dot_lane, dup_lane, dup_n, eor, ext, fmlal, fmlsl, get_high, get_lane, get_low, ld1, ld1_dup, ld1_lane, ld1_x2, ld1_x3, ld1_x4, ld1q_x2, ld1q_x3, ld1q_x4, ld2, ld2_dup, ld2_lane, ld3, ld3_dup, ld3_lane, ld4, ld4_dup, ld4_lane, maxnm, maxnmv, maxv, minnm, minnmv, minv, mmlaq, mul, mull, mull_high, mull_high_lane, mull_high_n, mulx, mulx_lane, mulx_n, mvn, padd, pmax, pmaxnm, pmin, pminnm, qmovun_high, qrdmlah, qrdmlah_lane, qrdmlsh, qrdmlsh_lane, qrdmulh_lane, qshlu_n, qshrun_high_n, qshrun_n, qtbl, qtbx, rax, rbit, recps, recpx, reinterpret, rev16, rev32, rev64, rnd, rnd32x, rnd32z, rnd64x, rnd64z, rnda, rndi, rndm, rndp, rndx, set_lane, sha1, sha256, sha512, shll_high_n, shrn_high_n, shrn_n, sli_n, sm3, sm4, sri_n, st1, st1_lane, st1_x2, st1_x3, st1_x4, st1q_x2, st1q_x3, st1q_x4, st2, st2_lane, st3, st3_lane, st4, st4_lane, subhn_high, sudot_lane, tbl, tbx, trn, trn1, trn2, tst, types, usdot,
usdot_lane, uzp, uzp1, uzp2, zip, zip1, zip2

Thanks for reading and any recommendations are welcome:tada::tada::tada:!

Copy link
Collaborator

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited to see this next PR, thank you @yyctw !

Here are some random comments; I'll wait until the tests pass to review each file in detail

simde/arm/neon/aes.h Outdated Show resolved Hide resolved
meson.build Show resolved Hide resolved
@yyctw yyctw force-pushed the A64-2 branch 9 times, most recently from b0e4d8f to 2f98952 Compare October 27, 2023 08:25
@yyctw yyctw requested a review from mr-c October 27, 2023 08:29
Copy link
Collaborator

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on the AES code

simde/arm/neon/aes.h Outdated Show resolved Hide resolved
simde/arm/neon/aes.h Outdated Show resolved Hide resolved
simde/arm/neon/aes.h Outdated Show resolved Hide resolved
simde/arm/neon/aes.h Outdated Show resolved Hide resolved
simde/arm/neon/aes.h Outdated Show resolved Hide resolved
@mr-c
Copy link
Collaborator

mr-c commented Oct 27, 2023

Some thoughts on the AES code

If need be, we can split out the AES code to a separate PR to keep going here without it.

test/arm/neon/qrdmlah_lane.c Outdated Show resolved Hide resolved
@yyctw
Copy link
Contributor Author

yyctw commented Oct 28, 2023

Some thoughts on the AES code

If need be, we can split out the AES code to a separate PR to keep going here without it.

Ok, I have removed it.

yyctw added 11 commits October 28, 2023 14:10
- 2 dup{q}_laneq_f16
- 9 dup{b,h}_lane{q}_{s/u}{8,16}, duph_laneq_f16
- 1 vmaxh_f16
- 3 vmaxnm{/h/q}_f16
- 5 vmaxnmv{q}_f{16/32/64}
- 2 vmaxv{q}_f16
- 1 vminh_f16
- 3 vminnm{/h/q}_f16
- 5 vminnmv{q}_f{16/32/64}
- 2 vminv{q}_f16
- 1 vpmaxq_f16
- 7 vpmaxnm{/s/q/qd}_f{16/32/64}
- 2 vpmin{q}_f16
- 7 vpminnm{/s/q/qd}_f{16/32/64}
mmlaq, mull_high_lane, mull_high_n, mulx,
mulx_lane, mulx_n, qrdmlah, qmovun_high.
@yyctw yyctw requested a review from mr-c November 9, 2023 06:44
simde/arm/neon/ld1_x4.h Outdated Show resolved Hide resolved
simde/arm/neon/ld2_dup.h Outdated Show resolved Hide resolved
simde/arm/neon/ld2_dup.h Outdated Show resolved Hide resolved
simde/arm/neon/ld2_dup.h Outdated Show resolved Hide resolved
simde/arm/neon/st1_x2.h Outdated Show resolved Hide resolved
test/arm/neon/ld2_dup.c Show resolved Hide resolved
test/arm/neon/ld2_dup.c Show resolved Hide resolved
test/arm/neon/ld2_dup.c Show resolved Hide resolved
simde/arm/neon/cmla.h Outdated Show resolved Hide resolved
simde/arm/neon/cmla.h Outdated Show resolved Hide resolved
@yyctw yyctw requested a review from mr-c November 13, 2023 06:14
simde/arm/neon/ld2.h Outdated Show resolved Hide resolved
@yyctw yyctw requested a review from mr-c November 14, 2023 02:41
@mr-c mr-c merged commit 018ba24 into simd-everywhere:master Nov 16, 2023
69 checks passed
@mr-c
Copy link
Collaborator

mr-c commented Nov 16, 2023

TL;DR: SIMDe currently implements 6443 out of 6670 (96.60%) NEON functions. If you don't count bf16 types, it's 6443 / 6466 (99.64%).

!!!

Thank you @yyctw !

simd-everywhere/implementation-status@80829f2

@yyctw
Copy link
Contributor Author

yyctw commented Nov 17, 2023

No problem. Thank you @mr-c for your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants