Skip to content
This repository has been archived by the owner on May 7, 2024. It is now read-only.

[RVP] add p extension spec 0.94 support #258

Open
wants to merge 44 commits into
base: riscv-gcc-experiment
Choose a base branch
from

Conversation

linsinan1995
Copy link
Contributor

@linsinan1995 linsinan1995 commented Apr 23, 2021

  • Andes-tech and Shao-Chung Wang implement the RISC-V 'P' extension with the spec version 0.51 in GCC 7.6
  • Sinan Lin update the implementation to the spec version 0.93 0.94 with GCC 10.2
  • Add commit 0b7b471 and 4b81528 for fixing configure option parsing error

@kito-cheng
Copy link
Collaborator

kito-cheng commented Apr 26, 2021

  • This patch is too huge, could you split it into several patches? I guess at list you could split into following different parts:
    • MD pattern changes
      • Minimal support for new mode, e.g. move pattern + necessary hook changes in riscv.c
      • standard pattern for those modes.
      • other MD patterns, at least one patch for each sub-extension to make it easier to review.
    • Intrinsic/built-in functions support
    • Auto vectorization support
  • Don't merge other standalone commit to your commit, like 0b7b471 and 4b81528, that will cause author info is gone, and also impossible to distinguish which part is changed by you.

@linsinan1995
Copy link
Contributor Author

Hi @kito-cheng , thanks for the suggestions! I will do that later on. Also, I wonder can we have a new experimental branch for p ext, since it might be still huge even if we split it into small pieces.

@linsinan1995 linsinan1995 force-pushed the riscv-gcc-experiment-p-ext branch from 2bc5611 to 29d62e2 Compare May 13, 2021 10:02
Copy link
Contributor

@yetingk yetingk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For [Hook] Add TARGET_VECTOR_MODE_SUPPORTED_P implementation
Since some instructions like smul16 in rv32p need vector register with 64-bit, I think you should also consider MODE_VECTOR_INT.

@linsinan1995 linsinan1995 force-pushed the riscv-gcc-experiment-p-ext branch from 83e60d0 to 608b5be Compare May 17, 2021 05:29
@linsinan1995
Copy link
Contributor Author

@fakepaper56 Thanks, fixed in new commit.

@kito-cheng
Copy link
Collaborator

Could you rebase and fix the conflict?

@linsinan1995 linsinan1995 force-pushed the riscv-gcc-experiment-p-ext branch from 608b5be to 8ded055 Compare May 17, 2021 09:50
@linsinan1995
Copy link
Contributor Author

No conflict now.

@kito-cheng
Copy link
Collaborator

Oops, I still saw This branch cannot be rebased due to conflicts on github?

@kito-cheng
Copy link
Collaborator

Oh, but it could be merged by merge or squash and merge

Copy link
Collaborator

@kito-cheng kito-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes are so huge, I guess I need take long time to review that, and I'll submit my review when I review part of patch.

(define_code_attr ph [(plus "") (minus "")])

;; bit count, clz and clrs
(define_code_iterator unop [clrsb clz])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there is no bitc ount here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the new commit eb5b39a

typedef unsigned int uint32x2_t __attribute__((vector_size(8)));
typedef unsigned int uint32x4_t __attribute__((vector_size(16)));

#define __riscv__kabsw(a) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only saw __rv__kabsw in the spec? it seems redundant ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only helps intrinsic header more readable, e.g.

#if defined(__riscv_zpsf) && __riscv_xlen == 64
#define __rv__v_smar64 __riscv__v_smar64
#define __rv__v_smsr64 __riscv__v_smsr64
...

instead of several lines for one intrinsic. I can remove them if it is not appropriate.

@@ -212,3 +212,21 @@
{
return riscv_gpr_save_operation_p (op);
})

(define_predicate "register_even_operand"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it necessary even riscv_hard_regno_mode_ok has guarded that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not necessary, and it was removed by commit ecd8198

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you squash your patch together? I mean merge the fixup patch into the original one, since I am review the commit one-by-one, and it would be easier review when you submit patch to upstream.

Copy link
Contributor Author

@linsinan1995 linsinan1995 May 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kito, I have sorted them out.

"TARGET_ZPN && !TARGET_64BIT"
"ave\t%0, %1, %2"
[(set_attr "type" "dsp")
(set_attr "mode" "SI")])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest merge riscv_ave and ave, riscv_ave seems redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the new commit eb5b39a

DONE;
})

(define_insn "avedi"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

@@ -56,6 +56,10 @@
UNSPEC_UKSUBW
UNSPEC_UKADDH
UNSPEC_UKSUBH
UNSPEC_BITREV
UNSPEC_CLROV
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to define UNSPEC_CLROV in unspecv rather than unspec if you are use that in unspec_volatile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the new commit eb5b39a


(define_insn "riscv_<rvp_optab>si2"
[(set (match_operand:SI 0 "register_operand" "=r")
(unop:SI (match_operand:SI 1 "register_operand" "r")))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using standard pattern name[1] for those pattern, I mean without the riscv_ prefix, using standard pattern name can teach GCC to generate that instruction when user using __builtin_clz and __builtin_clrsb[2]

[1] https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#index-clrsbm2-instruction-pattern
[2] https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fclz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your info! Fixed in commit 715c9ca. Also, there are many other patterns that can be renamed (remove riscv_ prefix), such as smin, smax, etc. Maybe I can just remove the riscv_ prefix in all the names?

[(set (match_operand:VQIHI 0 "register_operand" "=r")
(unspec:VQIHI [(eq:VQIHI (match_operand:VQIHI 1 "register_operand" " r")
(match_operand:VQIHI 2 "register_operand" " r"))]
UNSPEC_CMPEQ))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reference vcondu<mode><v_cmp_mixed> @ aarch64/aarch64-simd.md and try to remove the unspec here, I think it should be able to represent without unspec?

gcc/config/riscv/constraints.md Outdated Show resolved Hide resolved
@linsinan1995 linsinan1995 force-pushed the riscv-gcc-experiment-p-ext branch from f149f17 to 82035e7 Compare May 31, 2021 04:56
[(set_attr "type" "dsp64")
(set_attr "mode" "DI")])

;; uk|k add|sub w|h
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you simplify those pattern by define_int_iterator and define_int_attr?


;; KDMBB, KDMBT, KDMTT (32 (*) (xlen, xlen) for vector mode & 32 (*) (32, 32) for non-vector mode)
(define_insn "kdmbb<mode>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kdm*<mode> seems can be merge with kdm* and v64_kdm* pattern?

(define_insn "ksll"
[(set (match_operand:SI 0 "register_operand" "= r, r")
(ss_ashift:SI (match_operand:SI 1 "register_operand" " r, r")
(match_operand:SI 2 "rimm5u_operand" " Iu05, r")))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u05 is enough here.

(set_attr "mode" "SI")])

;; MULR64, MULSR64
(define_insn "rvp_umulsidi3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umulsidi3 is a standard pattern name, so I would suggest remove rvp_.

[(set_attr "type" "dsp")
(set_attr "mode" "DI")])

(define_insn "rvp_mulsidi3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mulsidi3 is a standard pattern name, so I would suggest remove rvp_ here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions.

The mulsidi3 and umulsidi3 are defined in riscv.md in 32 bit, so I put

if (TARGET_ZPN)
  emit_insn (gen_rvp_<u>mulsidi3 (operands[0], operands[1], operands[2]));
else
  ...

under define_expand "<u>mulsidi3" and define <u>mulsidi3 in rvp.md for 64 bit. Does it sound a good solution?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems right solution :)

@linsinan1995 linsinan1995 force-pushed the riscv-gcc-experiment-p-ext branch 2 times, most recently from 1c4e324 to d3b8d1c Compare June 1, 2021 10:03
@linsinan1995 linsinan1995 force-pushed the riscv-gcc-experiment-p-ext branch 2 times, most recently from 9e4e687 to 57a3b83 Compare June 15, 2021 18:29
@linsinan1995
Copy link
Contributor Author

Hi Kito, the feedback problems above are addressed, except removing the unspec of vector compare pattern, which I will investigate later.

@linsinan1995 linsinan1995 force-pushed the riscv-gcc-experiment-p-ext branch from 57a3b83 to a24b56a Compare June 16, 2021 12:23
(unspec:GPR [(match_operand:GPR 1 "register_operand" "0")
(match_operand:GPR 2 "register_operand" "r")
(match_operand:GPR 3 "register_operand" "r")] UNSPEC_PBSADA))]
""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"TARGET_ZPN"?

@@ -525,6 +525,7 @@ pru-*-*)
;;
riscv*)
cpu_type=riscv
extra_headers="rvp_intrinsic.h"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest rvpintrin.h similar to rvkintrin.h

(match_test "ival < (1 << 6) && ival >= 0")))

(define_constraint "C00"
"Constant value 1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0?

@@ -68,6 +68,8 @@ riscv_implied_info_t riscv_implied_info[] =
{"zks", "zksh"},
{"zks", "zkg"},
{"zks", "zkb"},
{"p", "zpn"},
{"p", "zpsf"},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zprv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping here means the p flag in --with-arch=rv{xlen}gcp or -march=rv{xlen}gcp can be expanded to rv{xlen}gc_zpn_zpsf.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Still I think that zprv should appear on xlen=64.
When I run riscv64-unknown-elf-gcc -march=rv64imcbp -mabi=lp64 ..... on compiler with this PR, I get

rv64up.c: warning: implicit declaration of function '__rv__kadd32' [-Wimplicit-function-declaration]
    |   if (0x0000000000000000 != __rv__kadd32(0x0000000000000000, 0x0000000000000000)) return 536;
    |                             ^~~~~~~~~~~~

This is so because rv64imcbp expands to rv64imcb_zpn_zpsf whilest it should expand to rv64imcb_zpn_zpsf_zprv because zprv is required on xlen=64 by spec (I simplify a bit: I didn't expand b to its zeds here because it is not the issue).
If I do not treat warnings as errors, it obviously fails on linking with:

objs/rv64imcb@pcsh-bm1_rv64up/rv64up.o: in function `.L0 ': undefined reference to `__rv__kadd32'

I faced a more general problem with this PR: when I compile my code with riscv64-unknown-elf-gcc -march=rv64imcb -mabi=lp64 ..... (i.e. without p-ext) it fails on linking with this error:

objs/rv64imcb@pcsh-bm1_rv64up/rv64up.o: in function `.L0 ': undefined reference to `__builtin_riscv_kadd16'

I have my own p-ext intrinsic file (which is generally

#ifdef __riscv_zpn
static inline __rv__insn(uintXLEN_t rs1, uintXLEN_t rs2) { uintXLEN_t rd; __asm__("insn %0, %1, %2" : "=r"(rd) : "r"(rs1), "r"(rs2)); return rd; }
#else
static inline __rv__insn(uintXLEN_t rs1, uintXLEN_t rs2) { SOME_GENERIC_CODE; return rd; }
#endif

with different variations of types, number of arguments, clobbers and other stuff). I did so because I am not a pro at machine description language and also it seems to be a common way to implement intrinsics (e.g. https://github.com/rvkrypto/rvkrypto-fips/blob/main/rvkintrin.h).

So my intrinsics produce an assembler code with the specified insn if p-ext is enabled and make a call to (or inline -- depends on optimization levels) some generic-code functions otherwise.

When I found this PR, I expected it works the same way and in addition allows to optimize some generic code with emitting of p-ext insns.
As you mentioned in another topic, "generic code" should use very specific types defined in the same file where intrinsics are -- so we barely expect that such insns will appear on some code with common scalar types as b-ext insns do. It is ok for me: I understand that b-ext machine descriptions implement some simple stuff (e.g. bit set) and this PR operates with packed types so this is either very hard or impossible to map a set of scalar type insns to p-ext insn in most cases. Still I don't understand why intrinsics do not work (I mean do not provide generic code) without enabled p-ext and what should be done if I want to compile some file with the compiler which includes this intrinsic file without p-ext insns.

Copy link
Contributor Author

@linsinan1995 linsinan1995 Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @marcfedorow,

This is so because rv64imcbp expands to rv64imcb_zpn_zpsf whilest it should expand to rv64imcb_zpn_zpsf_zprv because zprv is required on xlen=64 by spec (I simplify a bit: I didn't expand b to its zeds here because it is not the issue).

Thank you for pointing it out. I added a fix to support a different expansion on p flag (rv32p will be expanded into rv32_zpn_zpsf, and rv64p will be rv64_zpn_zprv_zpsf).

with different variations of types, number of arguments, clobbers and other stuff). I did so because I am not a pro at machine description language and also it seems to be a common way to implement intrinsics (e.g. https://github.com/rvkrypto/rvkrypto-fips/blob/main/rvkintrin.h).
So my intrinsics produce an assembler code with the specified insn if p-ext is enabled and make a call to (or inline -- depends on optimization levels) some generic-code functions otherwise.

I think we do not have such a plan so far, since it requires a great amount of workload. I think it is a good idea though, so I will forward your message to other PLCT members in the further meetup.

As you mentioned in another topic, "generic code" should use very specific types defined in the same file where intrinsics are -- so we barely expect that such insns will appear on some code with common scalar types as b-ext insns do. It is ok for me: I understand that b-ext machine descriptions implement some simple stuff (e.g. bit set) and this PR operates with packed types so this is either very hard or impossible to map a set of scalar type insns to p-ext insn in most cases. Still I don't understand why intrinsics do not work (I mean do not provide generic code) without enabled p-ext and what should be done if I want to compile some file with the compiler which includes this intrinsic file without p-ext insns.

Vector type int16x4_t doesn't need to be from p-ext header, since it is native support in GCC (https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html). To use intrinsic might be the only solution if you try to use scalar type instead of vector type (e.g. put int16x4_t data onto int64_t field), but p-ext insn is possible to be generated from scalar code through auto-vectorization. e.g.

#include <rvp_intrinsic.h>
#include <stdint.h>

typedef short v4hi __attribute__((vector_size (8)));

v4hi v_sadd16_spn (v4hi ra, v4hi rb)
{
  return ra + rb;
}

int16_t *v_sadd16_arr (int16_t *ra, int16_t *rb, int len)
{
  for (int i = 0; i < len; i++)
	  ra[i] += rb[i];
  return ra;
}

riscv64-unknown-elf-gcc -S -O3 add16.c

	.file	"add16.c"
	.option nopic
	.attribute arch, "rv64i2p0_m2p0_a2p0_f2p0_d2p0_c0p0_p2p0_zpn2p0_zprv2p0_zpsf2p0"
	.attribute unaligned_access, 0
	.attribute stack_align, 16
	.text
	.align	1
	.globl	v_sadd16_spn
	.type	v_sadd16_spn, @function
v_sadd16_spn:
	add16	a0, a0, a1
	ret
	.size	v_sadd16_spn, .-v_sadd16_spn
	.align	1
	.globl	v_sadd16_arr
	.type	v_sadd16_arr, @function
v_sadd16_arr:
	ble	a2,zero,.L4
	addi	a5,a1,2
	addiw	a3,a2,-1
	sub	a5,a0,a5
	sext.w	a4,a3
	sltiu	a5,a5,5
	li	a6,5
	xori	a5,a5,1
	sgtu	a4,a4,a6
	and	a5,a4,a5
	sext.w	t1,a2
	beq	a5,zero,.L5
	or	a5,a1,a0
	andi	a5,a5,7
	bne	a5,zero,.L5
	srliw	a7,t1,2
	slli	a7,a7,3
	mv	a5,a0
	mv	a3,a1
	add	a7,a7,a0
.L6:
	ld	a4,0(a5)
	ld	a6,0(a3)
	addi	a5,a5,8
	addi	a3,a3,8
	add16	a4, a4, a6
	sd	a4,-8(a5)
	bne	a5,a7,.L6
	andi	a5,t1,-4
	mv	a4,a5
	beq	t1,a5,.L4
	slli	a5,a5,32
	srli	a5,a5,31
	add	a7,a0,a5
	add	a3,a1,a5
	lhu	a6,0(a3)
	lhu	t1,0(a7)
	addiw	a3,a4,1
	addw	a6,a6,t1
	sh	a6,0(a7)
	bge	a3,a2,.L4
	addi	a3,a5,2
	add	a6,a0,a3
	add	a3,a1,a3
	lhu	a3,0(a3)
	lhu	a7,0(a6)
	addiw	a4,a4,2
	addw	a3,a3,a7
	sh	a3,0(a6)
	bge	a4,a2,.L4
	addi	a5,a5,4
	add	a4,a0,a5
	add	a1,a1,a5
	lhu	a5,0(a1)
	lhu	a3,0(a4)
	addw	a5,a5,a3
	sh	a5,0(a4)
	ret
.L5:
	slli	a5,a3,32
	srli	a3,a5,31
	addi	a2,a0,2
	mv	a5,a0
	add	a2,a2,a3
.L8:
	lhu	a4,0(a5)
	lhu	a3,0(a1)
	addi	a5,a5,2
	addi	a1,a1,2
	addw	a4,a4,a3
	sh	a4,-2(a5)
	bne	a5,a2,.L8
.L4:
	ret
	.size	v_sadd16_arr, .-v_sadd16_arr
	.ident	"GCC: (GNU) 10.2.0"

I hope I have answered all your questions. Thanks again.

(__builtin_riscv_ukmsr64 ((r), (a), (b)))
#define __rv__maddr32(a, b) \
(__builtin_riscv_maddr32 ((a), (b)))
#define __rv__msubr32(a, b) \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r, a, b

(__builtin_riscv_ukmar64 ((r), (a), (b)))
#define __rv__ukmsr64(r, a, b) \
(__builtin_riscv_ukmsr64 ((r), (a), (b)))
#define __rv__maddr32(a, b) \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r, a, b

@marcfedorow
Copy link

I am unable to emit p-ext insns from generic code.
Here are my examples:

#include <rvp_intrinsic.h>
#include <stdint.h>
#include <stdlib.h>

#define ADD_SUB(u, n, BODY)                                 \
    uint64_t rd = 0;                                        \
    for (int i = 0; i < __riscv_xlen; i += n) {             \
        u##int##n##_t rs1s = rs1 >> i, rs2s = rs2 >> i;     \
        rd |= (uint64_t)((BODY) & UINT##n##_MAX) << i;      \
    }                                                       \
    return rd

static inline uint64_t add16(uint64_t rs1, uint64_t rs2)
{
    ADD_SUB( , 16, rs1s + rs2s );
}

volatile long unsigned int foo(){
  return rand();
}

int main()
{
  uint64_t a = foo(), b = foo();
  return __rv__add16(a, b) != add16(b, a);
}

compiles to

 <main>:
1141                	addi	sp,sp,-16
e406                	sd	ra,8(sp)
e022                	sd	s0,0(sp)
66e000ef          	jal	ra,800015c8 <rand>
842a                	mv	s0,a0
668000ef          	jal	ra,800015c8 <rand>
02055693          	srli	a3,a0,0x20
02045613          	srli	a2,s0,0x20
872a                	mv	a4,a0
4105559b          	sraiw	a1,a0,0x10
4104579b          	sraiw	a5,s0,0x10
0106939b          	slliw	t2,a3,0x10
0106189b          	slliw	a7,a2,0x10
6541                	lui	a0,0x10
fff50293          	addi	t0,a0,-1 # ffff
00b7833b          	addw	t1,a5,a1
4103d81b          	sraiw	a6,t2,0x10
4108de1b          	sraiw	t3,a7,0x10
00537eb3          	and	t4,t1,t0
01c80f3b          	addw	t5,a6,t3
43075f93          	srai	t6,a4,0x30
43045593          	srai	a1,s0,0x30
0104161b          	slliw	a2,s0,0x10
0107139b          	slliw	t2,a4,0x10
005f76b3          	and	a3,t5,t0
01f5853b          	addw	a0,a1,t6
000e879b          	sext.w	a5,t4
4106531b          	sraiw	t1,a2,0x10
4103d81b          	sraiw	a6,t2,0x10
00557eb3          	and	t4,a0,t0
01079893          	slli	a7,a5,0x10
02069e13          	slli	t3,a3,0x20
01030f3b          	addw	t5,t1,a6
01c8efb3          	or	t6,a7,t3
030e9593          	slli	a1,t4,0x30
005f72b3          	and	t0,t5,t0
00bfe7b3          	or	a5,t6,a1
0002869b          	sext.w	a3,t0
00d7e533          	or	a0,a5,a3
40870477          	add16	s0,a4,s0
60a2                	ld	ra,8(sp)
40850733          	sub	a4,a0,s0
6402                	ld	s0,0(sp)
00e03533          	snez	a0,a4
0141                	addi	sp,sp,16
8082                	ret

and

#include <rvp_intrinsic.h>
#include <stdint.h>
#include <stdlib.h>

typedef union {
  uint64_t packed;
  uint16_t arr[4];
} pack;

static inline uint64_t add16(uint64_t rs1, uint64_t rs2)
{
  pack a, b, c;
  a.packed = rs1, b.packed = rs2;
  for (short i = 0; i < 4; ++i){
    c.arr[i] = a.arr[i] + b.arr[i];
  }
  return c.packed;
}

volatile long unsigned int foo(){
  return rand();
}

int main()
{
  uint64_t a = foo(), b = foo();
  return __rv__add16(a, b) != add16(b, a);
}

compiles to

 <main>:
1141                	addi	sp,sp,-16
e406                	sd	ra,8(sp)
e022                	sd	s0,0(sp)
63a000ef          	jal	ra,80001594 <rand>
842a                	mv	s0,a0
634000ef          	jal	ra,80001594 <rand>
01045693          	srli	a3,s0,0x10
01055793          	srli	a5,a0,0x10
00f682bb          	addw	t0,a3,a5
02045713          	srli	a4,s0,0x20
02055613          	srli	a2,a0,0x20
00a4033b          	addw	t1,s0,a0
00c703bb          	addw	t2,a4,a2
03055813          	srli	a6,a0,0x30
03045593          	srli	a1,s0,0x30
0802c8bb          	zext.h	a7,t0
01089e13          	slli	t3,a7,0x10
00b80ebb          	addw	t4,a6,a1
08034f3b          	zext.h	t5,t1
0803cfbb          	zext.h	t6,t2
01cf67b3          	or	a5,t5,t3
020f9293          	slli	t0,t6,0x20
030e9693          	slli	a3,t4,0x30
0057e733          	or	a4,a5,t0
00d76633          	or	a2,a4,a3
40850577          	add16	a0,a0,s0
40c50433          	sub	s0,a0,a2
60a2                	ld	ra,8(sp)
00803533          	snez	a0,s0
6402                	ld	s0,0(sp)
0141                	addi	sp,sp,16
8082                	ret

I also tried without volatile, here is my foo:

long unsigned int foo(){
  static int bar = 12345;
  return ^= 54321;
}

Results are pretty same -- generic code does not emit as an instruction.

Also I tried this:

int main()
{
  uint64_t a = 12345, b = 54321;
  return __rv__add16(a, b) != add16(b, a);
}

It compiled to this:

c0000797          	auipc	a5,0xc0000
8e878793          	addi	a5,a5,-1816 # 40000838 <__TMC_END__+0x10>
c0000297          	auipc	t0,0xc0000
8e828293          	addi	t0,t0,-1816 # 40000840 <__TMC_END__+0x18>
6388                	ld	a0,0(a5)
0002b303          	ld	t1,0(t0)
406503f7          	add16	t2,a0,t1
b9638593          	addi	a1,t2,-1130
00b03533          	snez	a0,a1
8082                	ret

And I tried that:

int main()
{
  uint64_t a = 0x12345, b = 0x54321;
  return __rv__add16(a, b) != add16(b, a);
}

And it was compiled out (li a0,0; ret).
So machine descriptions allow compiler to know that this is similar code in some cases.
What should I do if I want my generic code to be compiled with p-ext insns' emitting?

@linsinan1995 linsinan1995 force-pushed the riscv-gcc-experiment-p-ext branch from 5fc9529 to da54bac Compare November 9, 2021 11:57
(set_attr "mode" "DI")])

;; uk|k add|sub w|h
(define_int_attr karth_insn

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U?K(sub|add)[HW] could be described similar to U?K(sub|add) + sign extension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U?K(sub|add)[HW] could be described similar to U?K(sub|add) + sign extension

Hi @marcfedorow , this is added in the latest update. Thanks for your feedback.

0. update impl according to the change in 0.9.11
1. fix add/sub crash in rv32g
2. use standard name for saturation insn, e.g. kaddw and kaddh
@manasabatchu
Copy link

Assembler messages:
Error: rv64gc_zpn_zpsf: unknown prefixed ISA extension zpn' builtin-rvp64-add16.s:3: Error: rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zpn2p0_zpsf2p0: unknown prefixed ISA extension zpn'
builtin-rvp64-add16.s:21: Error: unrecognized opcode add16 a5,a4,a5' builtin-rvp64-add16.s:41: Error: unrecognized opcode add16 a5,a4,a5'
builtin-rvp64-add16.s:61: Error: unrecognized opcode `add16 a5,a4,a5'
how to solve this problem?

@pz9115
Copy link

pz9115 commented Sep 11, 2023

Assembler messages: Error: rv64gc_zpn_zpsf: unknown prefixed ISA extension zpn' builtin-rvp64-add16.s:3: Error: rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zpn2p0_zpsf2p0: unknown prefixed ISA extension zpn' builtin-rvp64-add16.s:21: Error: unrecognized opcode add16 a5,a4,a5' builtin-rvp64-add16.s:41: Error: unrecognized opcode add16 a5,a4,a5' builtin-rvp64-add16.s:61: Error: unrecognized opcode `add16 a5,a4,a5' how to solve this problem?

I think you need also use Binutils that supported P-ext,

We had update p-ext toolchain support with p-spec v0.9.11, here is the latest repo:

https://github.com/plctlab/riscv-gcc/tree/riscv-gcc-p-ext
https://github.com/plctlab/riscv-binutils-gdb/tree/riscv-binutils-p-ext

You can find some build record refer in this issue:
riscv-collab/riscv-gnu-toolchain#1298 (comment)

@manasabatchu
Copy link

Assembler messages: Error: rv64gc_zpn_zpsf: unknown prefixed ISA extension zpn' builtin-rvp64-add16.s:3: Error: rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zpn2p0_zpsf2p0: unknown prefixed ISA extension zpn' builtin-rvp64-add16.s:21: Error: unrecognized opcode add16 a5,a4,a5' builtin-rvp64-add16.s:41: Error: unrecognized opcode add16 a5,a4,a5' builtin-rvp64-add16.s:61: Error: unrecognized opcode `add16 a5,a4,a5' how to solve this problem?

I think you need also use Binutils that supported P-ext,

We had update p-ext toolchain support with p-spec v0.9.11, here is the latest repo:

https://github.com/plctlab/riscv-gcc/tree/riscv-gcc-p-ext https://github.com/plctlab/riscv-binutils-gdb/tree/riscv-binutils-p-ext

You can find some build record refer in this issue: riscv-collab/riscv-gnu-toolchain#1298 (comment)

Thank you for helping, now it's working fine

@manasabatchu
Copy link

Hi I am getting this error while doing compilation process
./.././gcc/gcc/config/riscv/rvp.md:21:50: error: unknown mode V4QI' ../.././gcc/gcc/config/riscv/rvp.md:21:73: note: following context is (V2HI "!TARGET_64BIT")'
Makefile:2572: recipe for target 's-preds-h' failed
can you help me how to solve this error

@pz9115
Copy link

pz9115 commented Sep 15, 2023

Hi I am getting this error while doing compilation process ./.././gcc/gcc/config/riscv/rvp.md:21:50: error: unknown mode V4QI' ../.././gcc/gcc/config/riscv/rvp.md:21:73: note: following context is (V2HI "!TARGET_64BIT")' Makefile:2572: recipe for target 's-preds-h' failed can you help me how to solve this error

Can you support the source file for me to reproduce this problem,thanks.

@manasabatchu
Copy link

Hi @pz9115, this is the source code file link which I'm consuming in my project. I have already stated the triggered error line in my previous comment, please let me know if you need any other information.

@manasabatchu
Copy link

Hi when I am running the p extension testcase I am getting this error how to solve it?

/home/manasa/PSIMD/riscv-gnu-toolchain/lib/gcc/riscv32-unknown-elf/10.2.0/include/rvp_intrinsic.h:311:1: error: incompatible types when returning type 'int' but 'uint8xN_t' {aka 'uint8x4_t'} was expected
311 | CREATE_RVP_INTRINSIC_VECTOR (uint8xN_t, uadd8, uint8xN_t, uint8xN_t)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Makefile:19: recipe for target 'add_8.riscv' failed
make: *** [add_8.riscv] Error 1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants