Skip to content
This repository has been archived by the owner on May 7, 2024. It is now read-only.

[bitmanip][WiP] [RFC] Add automatic generation of pack* #267

Draft
wants to merge 1 commit into
base: riscv-gcc-10.2.0-rvb
Choose a base branch
from

Conversation

rdolbeau
Copy link

@rdolbeau rdolbeau commented May 28, 2021

This adds automatic generation of pack* instructions (pack, packu, packh) beyond zero-extension.
This is implemented via a custom pass that
a) reorganize chains of '[ix]or' to exhibit regular patterns;
b) matches common pattern of pack/packu/packh and replace them by the appropriate instruction
(as the code is very different, this PR draft replaces #262)

Not sure if it's completely suitable, but it should help quantify how useful those instructions are on a given code.

For instance, in openssl 1.1.1k on rv32gcbk_zbr_zbt, the compiler reorganize 470 sequences of three [ix]or and produces 1125 packh and 682 pack (plus some more that are matched directly). Many (most?) of them seems to be related to byte-by-byte loading of 32 bits words (lbu/lbu/lbu/lbu/packh/packh/pack). If the addresses were provably aligned they could be replaced by the much more efficient lw/grevi 0x18, but it's difficult to achieve in the back-end - the source code should probably be changed to use an explicit load-word/byte-reversal when using a 32-bits load is legal. For comparison, the resulting objects (all .o in thedirectory collectively contain (the source was patched to implement an AES with scalar K, hence the aes32*):

$ find openssl-1.1.1k -name '*.o' -print -exec riscv32-unknown-elf-objdump -d {} \; | grep '[a-f0-9]:' | awk '{ print $3 }' | sort | uniq -c
  18632 add
  34835 addi
     16 aes32dsi
     32 aes32dsmi
     16 aes32esi
     32 aes32esmi
     56 amoadd.w
     16 amoadd.w.aq
      4 amoswap.w
     20 amoswap.w.aq
   2450 and
   5309 andi
    253 andn
  71903 auipc
   5450 beq
  24221 beqz
    972 bge
    966 bgeu
    247 bgez
    570 bgtz
   1714 blez
   1034 blt
   1402 bltu
    785 bltz
   3477 bne
   8779 bnez
    216 cmix
    151 cmov
      1 ctz
     67 div
     67 divu
      3 ebreak
     12 fadd.d
     92 fcvt.d.w
      2 fcvt.d.wu
      3 fcvt.w.d
      5 fcvt.wu.d
     58 fdiv.d
     91 fence
      2 feq.d
     94 fld
      4 fle.d
     12 flt.d
     39 fmul.d
     47 fmv.d
      1 fneg.d
    176 fsd
      2 fsub.d
    229 grevi
  18397 j
  56873 jalr
   2940 jr
     37 lb
   8619 lbu
      4 lh
    375 lhu
  71075 li
  20339 lui
  97050 lw
     69 max
     29 maxu
     81 min
     79 minu
   1113 mul
    222 mulh
    602 mulhu
 108427 mv
    290 neg
    207 not
   3013 or
    392 ori
     85 orn
    729 pack
   1212 packh
     58 rem
     61 remu
  10130 ret
   1396 rol
    854 rori
   5968 sb
      2 sbclr
     10 sbclri
      6 sbext
    115 sbexti
     18 sbinvi
     64 sbset
     32 sbseti
    408 seqz
    116 sext.b
     97 sgtz
    148 sh
    186 sh1add
   2511 sh2add
    136 sh3add
    122 sll
   3687 slli
     12 slt
     15 slti
    142 sltiu
   2893 sltu
   2671 snez
     86 sra
    955 srai
     81 srl
   6338 srli
   3089 sub
  57113 sw
      1 xnor
   9300 xor
    226 xori

[edit] Adding a source file showing some of the patterns that match (or don't in some cases)
pack-pattern.txt

@rdolbeau rdolbeau force-pushed the riscv-gcc-10.2.0-rvb-pack branch 2 times, most recently from 015ed51 to 2e39046 Compare June 6, 2021 12:11
@rdolbeau rdolbeau force-pushed the riscv-gcc-10.2.0-rvb-pack branch from 2e39046 to 5370038 Compare June 7, 2021 11:41
This adds automatic generation of pack* instructions (pack, packu, packh) beyond zero-extension.
This is implemented via a custom pass that
a) reorganize chains of '[ix]or' to exhibit regular patterns;
b) matches common pattern of pack/packu/packh and replace them by the appropriate instruction
@rdolbeau rdolbeau force-pushed the riscv-gcc-10.2.0-rvb-pack branch from 5370038 to 0c74449 Compare June 8, 2021 09:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant