Some questions about Intrinsic #20

Geng-Qi-alibaba · 2020-11-25T08:28:44Z

spec: uintXLEN_t __rv__kabsw(intXLEN_t a)
I would expected “intXLEN_t" as its return type, for the return value was signed extended.

spec: uintXLEN_t __rv__wext(int64_t a, uint32_t b)
I would expected: intXLEN_t __rv__wext(uint64_t a, uint32_t b).

spec: uintXLEN_t __rv__kabsw(intXLEN_t a)
Here "intXLEN_t" is used as a parameter type. Similarly, ksllw, insb, kdmabb, swraw, which actually use only the lower 32 or lower 8 bits of these values. In my opinion, these are the same as kaddw, raddw, kslraw... , but kaddw uses int32_t.
So, what's the difference between them? How should I choose?

spec: uintXLEN_t __rv__bitrev(uintXLEN_t a, uintXLEN_t msb)
I would expected “uint32_t" for the second parameter type. The second argument is a index of the bit, the other similar cases use uint32_t.

spec:
intXLEN_t __rv__kdmabb(intXLEN_t t, uint32_t a, uint32_t b)
intXLEN_t __rv__v_kdmabb(intXLEN_t t, int16x2_t a, int16x2_t b)
intXLEN_t __rv__v_kdmabb(intXLEN_t t, int16x4_t a, int16x4_t b)
Here uint32_t is used as the second and third parameter type. It‘s good for 16x2 but not for 16x4.
So, we may need to replace "uint32_t" with "uintXLEN_t" or replace "int16x4" with "int16x2".

uintXLEN_t __rv__clrs32(uintXLEN_t a)
intXLEN_t __rv__kmmawb(intXLEN_t t, uintXLEN_t a, uintXLEN_t b)
The "uintXLEN_t" here stands for "int32x2_t". I think we should use "intXLEN_t", just like the first argument of kmmawb. Or is there some other resons for this design?

uintXLEN_t __rv__kslra8(uintXLEN_t a, int b)
I would expected "int32_t" for kslra8. The kslraw use "int32_t".
The same problem also occurs with kslra8_u, kslra16 and kslra16_u.

uint64_t __rv__smul8(unsigned int a, unsigned int b)
uint64_t __rv__umul8(unsigned int a, unsigned int b)
The "unsigned int" here stands for "int8x4_t" or "uint8x4_t". May be better to use "uint32_t"? The smul16 use "uint32_t".
The same problem with smulx8, umulx8, umul16 and umulx16.

uint8x4_t __rv__v_uclip8(uint8x4_t a, uint32_t b)
I would expected "int8x4_t", for the description states that it’s a signed value.
The same problem with uclip16, uclip32.

There are a few things that I think are obvious clerical errors.
uint16x4_t __rv__v_scmpeq16(int16x4_t a, int16x2_t b) --> uint16x4_t __rv__v_scmpeq16(int16x4_t a, int16x4_t b)
uint16x4_t __rv__v_ucmpeq16(uint16x4_t a, uint16x2_t b) --> uint16x4_t __rv__v_ucmpeq16(uint16x4_t a, uint16x4_t b)
uintXLEN_t rvkhm6(uintXLEN_t a, uintXLEN_t b) --> uintXLEN_t __rv__khm16(uintXLEN_t a, uintXLEN_t b)
in32x2_t --> int32x2_t for v_kmmawb_u, v_kmmawb2, v_kmmawb2_u, v_kmmawt2, v_kmmawt2_u
rv v_kmmsb_u --> __rv__v_kmmsb_u
_int32_t __rv__v_smaqa_su (int32_t t --> int32_t __rv__v_smaqa_su(int32_t t
int32x2_t __rv__v_smaqa_su (int32x2_t t --> int32x2_t __rv__v_smaqa_su(int32x2_t t
int32x2 --> int32x2_t for v_smds, v_smdrs, v_smxds
int __rv__v_smmwt(int a, int16x2_t b) --> int __rv__v_smmwt(int32_t a, int16x2_t b)
int __rv__v_smmwt_u(int a, int16x2_t b) --> int __rv__v_smmwt_u(int32_t a, int16x2_t b)
Strange line endings for ukmar64, ukmsr64, umar64, usub64, ukadd64, uksub64
Function Omission for mulr64, mulsr64, maddr32, msubr32

Suggestion:
spec: intXLEN_t __rv__v_kdmbb(int16x2_t a, int16x2_t b)
This Intrinsic is for RV32. May be better to use "int32_t" instead of "intXLEN_t" when the XLEN is fixed.

chuanhua · 2021-04-23T03:32:00Z

For question 6, the general principle of design is that for any operand that represents SIMD data types (a vector of smaller data elements), the operand prototype is simply defined as "uintXLEN_t".

chuanhua · 2021-04-23T03:40:40Z

For question 6, the problem with kmmawb is that when in RV32, the output operand should be int32_t (a single signed output). But in RV64, the output operand is int32x2_t. Using uintXLEN_t for the output in RV32 will be wrong.

chuanhua · 2021-04-28T10:03:31Z

For question 3, your opinion makes sense for the following instructions:

for kabsw: int32_t __rv__kabsw(int32_t a);
for ksllw/kslliw: int32_t __rv__ksllw(int32_t a, uint32_t b);
for sraiw.u: int32_t __rv__sraw_u(int32_t a, uint32_t b);
for kdmabb: int32_t __rv__kdmabb(int32_t t, uint32_t a, uint32_t b);
for kdmabt: int32_t __rv__kdmabt(int32_t t, uint32_t a, uint32_t b);
for kdmatt: int32_t __rv__kdmatt(int32_t t, uint32_t a, uint32_t b);

linsinan1995 · 2021-05-01T10:13:29Z

@chuanhua
For question 5, instructions like KDMBB, KDMBT, KDMTT, KHMBB, KHMBT, KHMTT in RV64P also have a not consistent type between vector and non-vector mode,
e.g.

KDMBB
Required:
int32_t __rv__kdmbb(uint32_t a, uint32_t b);
Optional (e.g., GCC vector extensions):
RV32:
  int32_t __rv__v_kdmbb(int16x2_t a, int16x2_t b);
RV64:
  int32_t __rv__v_kdmbb(int16x4_t a, int16x4_t b);

So I suppose int16x4_t here should be replaced by int16x2_t?

Geng-Qi-alibaba · 2021-05-11T09:00:48Z

For question 6, the problem with kmmawb is that when in RV32, the output operand should be int32_t (a single signed output). But in RV64, the output operand is int32x2_t. Using uintXLEN_t for the output in RV32 will be wrong.

When in RV32, the second arg of kmmawb should be int32_t, but uintXLEN_t was uint32_t. This doesn't create any bugs but it looks strange. If we used intXLEN_t here, the int32xN_t would have same representation at both input and output. So, it seems a little better, didn't it?

Geng-Qi-alibaba · 2021-05-11T09:18:32Z

For question 3, your opinion makes sense for the following instructions:

for kabsw: int32_t __rv__kabsw(int32_t a);

for ksllw/kslliw: int32_t __rv__ksllw(int32_t a, uint32_t b);

for sraiw.u: int32_t __rv__sraw_u(int32_t a, uint32_t b);

for kdmabb: int32_t __rv__kdmabb(int32_t t, uint32_t a, uint32_t b);

for kdmabt: int32_t __rv__kdmabt(int32_t t, uint32_t a, uint32_t b);

for kdmatt: int32_t __rv__kdmatt(int32_t t, uint32_t a, uint32_t b);

I mentioned input type in question 6, but not output type.
The modifications about output type may related to my question 11.
The problem is that more changes may be needed to ensure consistency. I have noticed that some others have met this problem. #45

In the same thread, question 4，7 and 8 are also mentioned.

Geng-Qi-alibaba · 2021-05-11T09:33:56Z

For question 11, most has been fixed at 0.9.4, but missed:
int __rv__v_smmwt(int a, int16x2_t b) --> int32 __rv__v_smmwt(int32_t a, int16x2_t b)

Geng-Qi-alibaba · 2021-05-11T09:47:08Z

Unsigned type may be better for the third arg of smaqa_su.

RV32:
int32_t __rv__v_smaqa_su(int32_t t, int8x4_t a, int8x4_t b);
RV64:
int32x2_t __rv__v_smaqa_su(int32x2_t t, int8x8_t a, int8x8_t b);

-->

RV32:
int32_t __rv__v_smaqa_su(int32_t t, int8x4_t a, uint8x4_t b);
RV64:
int32x2_t __rv__v_smaqa_su(int32x2_t t, int8x8_t a, uint8x8_t b);

Geng-Qi-alibaba mentioned this issue May 11, 2021

Intrinsic types #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about Intrinsic #20

Some questions about Intrinsic #20

Geng-Qi-alibaba commented Nov 25, 2020 •

edited

Loading

chuanhua commented Apr 23, 2021

chuanhua commented Apr 23, 2021

chuanhua commented Apr 28, 2021

linsinan1995 commented May 1, 2021 •

edited

Loading

Geng-Qi-alibaba commented May 11, 2021

Geng-Qi-alibaba commented May 11, 2021 •

edited

Loading

Geng-Qi-alibaba commented May 11, 2021 •

edited

Loading

Geng-Qi-alibaba commented May 11, 2021 •

edited

Loading

Some questions about Intrinsic #20

Some questions about Intrinsic #20

Comments

Geng-Qi-alibaba commented Nov 25, 2020 • edited Loading

chuanhua commented Apr 23, 2021

chuanhua commented Apr 23, 2021

chuanhua commented Apr 28, 2021

linsinan1995 commented May 1, 2021 • edited Loading

Geng-Qi-alibaba commented May 11, 2021

Geng-Qi-alibaba commented May 11, 2021 • edited Loading

Geng-Qi-alibaba commented May 11, 2021 • edited Loading

Geng-Qi-alibaba commented May 11, 2021 • edited Loading

Geng-Qi-alibaba commented Nov 25, 2020 •

edited

Loading

linsinan1995 commented May 1, 2021 •

edited

Loading

Geng-Qi-alibaba commented May 11, 2021 •

edited

Loading

Geng-Qi-alibaba commented May 11, 2021 •

edited

Loading

Geng-Qi-alibaba commented May 11, 2021 •

edited

Loading