This is an example hardware project written in SUS. It implements the Bit Serial Matrix Multiplication algorithm for efficient FPGA synthesis of Vector by Compile Time Matrix Multiplication. The main trick here is that by streaming the Vector values through the multipliers one bit at a time, instead of INT_SIZExINT_SIZE bit multipliers, you only need 1xINT_SIZE multipliers, which with constant multiplicant optimizes to simple wires. Matrix sparsity can be readily exploited here for cheaper a implementation.
This example was tested using sus_compiler --version
: SUS Compiler 0.1.1
bitSerialMatrixMultiply.sus
: File implementing the algorithmbitSerialMatrixMultiply_tb.sv
: SystemVerilog testbenchconstraints.xdc
: Definition for the clock for Vivado synthesisverilog_output
: sus_compiler output directory. This contains any files the SUS compiler generates.
The algorithm contains three major components:
- The input vector shift registers
- The Baked-in 1xINT_SIZE row multipliers
- The shifting result accumulators
Above you see implemented the Bit Shifter modules. These each take in a vector element in cycle 0. In the subsequent cycles they produce one bit per cycle, starting from the MBS.
These bits are fed to each Row instance, that each is synthesized with the compile-time matrix weights for that row. Each row implements a dot product of the row weights by the 1 bit input vector elements. An example instance of such a row is shown below.
Finally, the row sums are added to an accumulator register, and to stay in sync with the bit shifter, they are shifted left.
Using SUS parametrizeable modules, each BitSerialRow
takes the matrix column, picks out the non-zero elements and adds them up conditionally. The adder is implemented with a TreeAdd
module from util.sus
-
Clone the repository
-
Set up the SUS Compiler using the instructions provided there.
-
Generate the SystemVerilog equivalent of your hardware with
sus_compiler --standalone BitSerialMatrixMultiplyTinyIO *.sus
. (The TinyIO variant is only there to reduce pin use so Vivado synthesis wouldn't complain.--standalone
means the .sv file contains all dependencies of the chosen module.) -
(Optionally) You can do simulation using Icarus Verilog and Surfer. For that, run
./simSUS.sh
.
If you just want to mess around with SUS, then you can stop here.
If you wish to go deeper, then you can take the files in verilog_output and use them in downstream tools:
These were performed using Vivado on the resulting verilog_output/BitSerialMatrixMultiplyTinyIO_standalone.sv