Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about Intrinsic #20

Open
Geng-Qi-alibaba opened this issue Nov 25, 2020 · 8 comments
Open

Some questions about Intrinsic #20

Geng-Qi-alibaba opened this issue Nov 25, 2020 · 8 comments

Comments

@Geng-Qi-alibaba
Copy link
Contributor

Geng-Qi-alibaba commented Nov 25, 2020

spec: uintXLEN_t __rv__kabsw(intXLEN_t a)
I would expected “intXLEN_t" as its return type, for the return value was signed extended.

spec: uintXLEN_t __rv__wext(int64_t a, uint32_t b)
I would expected: intXLEN_t __rv__wext(uint64_t a, uint32_t b).

spec: uintXLEN_t __rv__kabsw(intXLEN_t a)
Here "intXLEN_t" is used as a parameter type. Similarly, ksllw, insb, kdmabb, swraw, which actually use only the lower 32 or lower 8 bits of these values. In my opinion, these are the same as kaddw, raddw, kslraw... , but kaddw uses int32_t.
So, what's the difference between them? How should I choose?

spec: uintXLEN_t __rv__bitrev(uintXLEN_t a, uintXLEN_t msb)
I would expected “uint32_t" for the second parameter type. The second argument is a index of the bit, the other similar cases use uint32_t.

spec:
intXLEN_t __rv__kdmabb(intXLEN_t t, uint32_t a, uint32_t b)
intXLEN_t __rv__v_kdmabb(intXLEN_t t, int16x2_t a, int16x2_t b)
intXLEN_t __rv__v_kdmabb(intXLEN_t t, int16x4_t a, int16x4_t b)
Here uint32_t is used as the second and third parameter type. It‘s good for 16x2 but not for 16x4.
So, we may need to replace "uint32_t" with "uintXLEN_t" or replace "int16x4" with "int16x2".

uintXLEN_t __rv__clrs32(uintXLEN_t a)
intXLEN_t __rv__kmmawb(intXLEN_t t, uintXLEN_t a, uintXLEN_t b)
The "uintXLEN_t" here stands for "int32x2_t". I think we should use "intXLEN_t", just like the first argument of kmmawb. Or is there some other resons for this design?

uintXLEN_t __rv__kslra8(uintXLEN_t a, int b)
I would expected "int32_t" for kslra8. The kslraw use "int32_t".
The same problem also occurs with kslra8_u, kslra16 and kslra16_u.

uint64_t __rv__smul8(unsigned int a, unsigned int b)
uint64_t __rv__umul8(unsigned int a, unsigned int b)
The "unsigned int" here stands for "int8x4_t" or "uint8x4_t". May be better to use "uint32_t"? The smul16 use "uint32_t".
The same problem with smulx8, umulx8, umul16 and umulx16.

uint8x4_t __rv__v_uclip8(uint8x4_t a, uint32_t b)
I would expected "int8x4_t", for the description states that it’s a signed value.
The same problem with uclip16, uclip32.

There are a few things that I think are obvious clerical errors.
uint16x4_t __rv__v_scmpeq16(int16x4_t a, int16x2_t b) --> uint16x4_t __rv__v_scmpeq16(int16x4_t a, int16x4_t b)
uint16x4_t __rv__v_ucmpeq16(uint16x4_t a, uint16x2_t b) --> uint16x4_t __rv__v_ucmpeq16(uint16x4_t a, uint16x4_t b)
uintXLEN_t rvkhm6(uintXLEN_t a, uintXLEN_t b) --> uintXLEN_t __rv__khm16(uintXLEN_t a, uintXLEN_t b)
in32x2_t --> int32x2_t for v_kmmawb_u, v_kmmawb2, v_kmmawb2_u, v_kmmawt2, v_kmmawt2_u
rv v_kmmsb_u --> __rv__v_kmmsb_u
_int32_t __rv__v_smaqa_su (int32_t t --> int32_t __rv__v_smaqa_su(int32_t t
int32x2_t __rv__v_smaqa_su (int32x2_t t --> int32x2_t __rv__v_smaqa_su(int32x2_t t
int32x2 --> int32x2_t for v_smds, v_smdrs, v_smxds
int __rv__v_smmwt(int a, int16x2_t b) --> int __rv__v_smmwt(int32_t a, int16x2_t b)
int __rv__v_smmwt_u(int a, int16x2_t b) --> int __rv__v_smmwt_u(int32_t a, int16x2_t b)
Strange line endings for ukmar64, ukmsr64, umar64, usub64, ukadd64, uksub64
Function Omission for mulr64, mulsr64, maddr32, msubr32

Suggestion:
spec: intXLEN_t __rv__v_kdmbb(int16x2_t a, int16x2_t b)
This Intrinsic is for RV32. May be better to use "int32_t" instead of "intXLEN_t" when the XLEN is fixed.

@chuanhua
Copy link
Contributor

For question 6, the general principle of design is that for any operand that represents SIMD data types (a vector of smaller data elements), the operand prototype is simply defined as "uintXLEN_t".

@chuanhua
Copy link
Contributor

For question 6, the problem with kmmawb is that when in RV32, the output operand should be int32_t (a single signed output). But in RV64, the output operand is int32x2_t. Using uintXLEN_t for the output in RV32 will be wrong.

@chuanhua
Copy link
Contributor

For question 3, your opinion makes sense for the following instructions:

  • for kabsw: int32_t __rv__kabsw(int32_t a);
  • for ksllw/kslliw: int32_t __rv__ksllw(int32_t a, uint32_t b);
  • for sraiw.u: int32_t __rv__sraw_u(int32_t a, uint32_t b);
  • for kdmabb: int32_t __rv__kdmabb(int32_t t, uint32_t a, uint32_t b);
  • for kdmabt: int32_t __rv__kdmabt(int32_t t, uint32_t a, uint32_t b);
  • for kdmatt: int32_t __rv__kdmatt(int32_t t, uint32_t a, uint32_t b);

@linsinan1995
Copy link
Contributor

linsinan1995 commented May 1, 2021

@chuanhua
For question 5, instructions like KDMBB, KDMBT, KDMTT, KHMBB, KHMBT, KHMTT in RV64P also have a not consistent type between vector and non-vector mode,
e.g.

KDMBB
Required:
int32_t __rv__kdmbb(uint32_t a, uint32_t b);
Optional (e.g., GCC vector extensions):
RV32:
  int32_t __rv__v_kdmbb(int16x2_t a, int16x2_t b);
RV64:
  int32_t __rv__v_kdmbb(int16x4_t a, int16x4_t b);

So I suppose int16x4_t here should be replaced by int16x2_t?

@Geng-Qi-alibaba
Copy link
Contributor Author

For question 6, the problem with kmmawb is that when in RV32, the output operand should be int32_t (a single signed output). But in RV64, the output operand is int32x2_t. Using uintXLEN_t for the output in RV32 will be wrong.

When in RV32, the second arg of kmmawb should be int32_t, but uintXLEN_t was uint32_t. This doesn't create any bugs but it looks strange. If we used intXLEN_t here, the int32xN_t would have same representation at both input and output. So, it seems a little better, didn't it?

@Geng-Qi-alibaba
Copy link
Contributor Author

Geng-Qi-alibaba commented May 11, 2021

For question 3, your opinion makes sense for the following instructions:

  • for kabsw: int32_t __rv__kabsw(int32_t a);
  • for ksllw/kslliw: int32_t __rv__ksllw(int32_t a, uint32_t b);
  • for sraiw.u: int32_t __rv__sraw_u(int32_t a, uint32_t b);
  • for kdmabb: int32_t __rv__kdmabb(int32_t t, uint32_t a, uint32_t b);
  • for kdmabt: int32_t __rv__kdmabt(int32_t t, uint32_t a, uint32_t b);
  • for kdmatt: int32_t __rv__kdmatt(int32_t t, uint32_t a, uint32_t b);

I mentioned input type in question 6, but not output type.
The modifications about output type may related to my question 11.
The problem is that more changes may be needed to ensure consistency. I have noticed that some others have met this problem. #45

In the same thread, question 4,7 and 8 are also mentioned.

@Geng-Qi-alibaba
Copy link
Contributor Author

Geng-Qi-alibaba commented May 11, 2021

For question 11, most has been fixed at 0.9.4, but missed:
int __rv__v_smmwt(int a, int16x2_t b) --> int32 __rv__v_smmwt(int32_t a, int16x2_t b)

@Geng-Qi-alibaba
Copy link
Contributor Author

Geng-Qi-alibaba commented May 11, 2021

Unsigned type may be better for the third arg of smaqa_su.

RV32:
int32_t __rv__v_smaqa_su(int32_t t, int8x4_t a, int8x4_t b);
RV64:
int32x2_t __rv__v_smaqa_su(int32x2_t t, int8x8_t a, int8x8_t b);

-->

RV32:
int32_t __rv__v_smaqa_su(int32_t t, int8x4_t a, uint8x4_t b);
RV64:
int32x2_t __rv__v_smaqa_su(int32x2_t t, int8x8_t a, uint8x8_t b);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants