-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEON: implement all intrinsics supported by architecture A64-remaining part #1093
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm excited to see this next PR, thank you @yyctw !
Here are some random comments; I'll wait until the tests pass to review each file in detail
b0e4d8f
to
2f98952
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts on the AES code
If need be, we can split out the AES code to a separate PR to keep going here without it. |
e8750d5
to
7987301
Compare
Ok, I have removed it. |
- 2 dup{q}_laneq_f16 - 9 dup{b,h}_lane{q}_{s/u}{8,16}, duph_laneq_f16
- 1 vmaxh_f16 - 3 vmaxnm{/h/q}_f16 - 5 vmaxnmv{q}_f{16/32/64} - 2 vmaxv{q}_f16
- 1 vminh_f16 - 3 vminnm{/h/q}_f16 - 5 vminnmv{q}_f{16/32/64} - 2 vminv{q}_f16
- 1 vpmaxq_f16 - 7 vpmaxnm{/s/q/qd}_f{16/32/64}
- 2 vpmin{q}_f16 - 7 vpminnm{/s/q/qd}_f{16/32/64}
mmlaq, mull_high_lane, mull_high_n, mulx, mulx_lane, mulx_n, qrdmlah, qmovun_high.
- Correct format - Add test generation code
- Fix wrong test case in reinterpret.c
!!! Thank you @yyctw ! |
No problem. Thank you @mr-c for your review! |
Hi all, this is Eric from Andes Technology Corporation. This PR is the remaining part of the previous PR and includes the following:
Implement all poly-related types using
uint
.Implement all functions related to the
poly
type (with test cases).Implement all functions related to the
bf16
type (without test cases).Add 1035 initial implementations and corresponding test cases in 137 families which are listed below:
add
,aes
,bsl
,ceq
,ceqz
,cmla
,cmla_rot180
,cmla_rot270
,cmla_rot90
,cnt
,combine
,copy_lane
,crc32
,create
,cvt
,div
,dot
,dot_lane
,dup_lane
,dup_n
,eor
,ext
,fmlal
,fmlsl
,get_high
,get_lane
,get_low
,ld1
,ld1_dup
,ld1_lane
,ld1_x2
,ld1_x3
,ld1_x4
,ld1q_x2
,ld1q_x3
,ld1q_x4
,ld2
,ld2_dup
,ld2_lane
,ld3
,ld3_dup
,ld3_lane
,ld4
,ld4_dup
,ld4_lane
,maxnm
,maxnmv
,maxv
,minnm
,minnmv
,minv
,mmlaq
,mul
,mull
,mull_high
,mull_high_lane
,mull_high_n
,mulx
,mulx_lane
,mulx_n
,mvn
,padd
,pmax
,pmaxnm
,pmin
,pminnm
,qmovun_high
,qrdmlah
,qrdmlah_lane
,qrdmlsh
,qrdmlsh_lane
,qrdmulh_lane
,qshlu_n
,qshrun_high_n
,qshrun_n
,qtbl
,qtbx
,rax
,rbit
,recps
,recpx
,reinterpret
,rev16
,rev32
,rev64
,rnd
,rnd32x
,rnd32z
,rnd64x
,rnd64z
,rnda
,rndi
,rndm
,rndp
,rndx
,set_lane
,sha1
,sha256
,sha512
,shll_high_n
,shrn_high_n
,shrn_n
,sli_n
,sm3
,sm4
,sri_n
,st1
,st1_lane
,st1_x2
,st1_x3
,st1_x4
,st1q_x2
,st1q_x3
,st1q_x4
,st2
,st2_lane
,st3
,st3_lane
,st4
,st4_lane
,subhn_high
,sudot_lane
,tbl
,tbx
,trn
,trn1
,trn2
,tst
,types
,usdot
,usdot_lane
,uzp
,uzp1
,uzp2
,zip
,zip1
,zip2
Thanks for reading and any recommendations are welcome:tada::tada::tada:!