An FPGA-based hardware Snappy decompressor. This is a new kind of decompressor architecture that can process multiple literal token and copy token in parallel.
Simulation: xsim+snap(POWER9+CAPI2.0+ADM9V3) under 250MHz
Measurement: POWER9+CAPI2.0+ADM9V3 under 250MHz
Future: 1) Move to OpenCAPI Platform; 2) Multiple engines
For users who want to integrate the decompressor in your own system, please use the wrapper file (decompressor_wrapper) as the top-level file of the decompressor.
The decompressor can perform as a box that processes on the input stream and output the decompressed stream. The only requirement is to set the comprssion_length and the decompression_length before the data streamed in.
All metadata and data communication is under ready/valid handshake protocol.
The decompressor uses the following interface:
input clk, // the clock signal
input rst_n, // the reset signal
output last, // Whether the data is the last one in a burst
output done, // Whether the decompression is done
input start, // Start the decompressor after the compression_length and decompression_length is set
// The user should set it to 1 for starting the decompressor, and need to set it back to 0 after 1 cycle
input[511:0] in_data, //The compressed data
input in_data_valid, //Whether or not the data on the in_data port is valid.
output in_data_ready, //Whether or not the decompressor is ready to receive data on its in_data port
input[34:0] compression_length, //length of the data before decompression (compressed data)
input[31:0] decompression_length, //length of the data after decompression (uncompressed data)
input in_metadata_valid, //Whether or not the data on the compression_length and decompression_length ports is valid.
output in_metadata_ready, //Whether or not the decompressor is ready to receive data on its compression_length and decompression_length ports.
output[511:0] out_data, //The decompressed data
output out_data_valid, //Whether or not the data on the out_data port is valid
output[63:0] out_data_byte_valid, //Which bytes of the output is valid
input out_data_ready //Whether or not the component following the decompressor is ready to receive data.
A communication protocol should follow a few step.
(1) Set metadata (compression_length and decompression_length)
(2) Set "start"
(3) stream data in for decompression
(4) After "done" signal return, a new decompression can be processed and start again from Step (1).
Currently, the decompressor is used on IBM CAPI 2.0 with SNAP interface. See: https://github.com/open-power/snap
The demo will work based on this platform: fetch data from memory, do decompression and send decompression result back
Currently, this project utilizes some IP cores which is generated by the tcl file (create_action_ip.tcl)
ip: IP files for the decompressor (tcl files)
source: Verilog files for the decompressor
interface: VHDL file to connect the decompressor to IBM CAPI platform and run a demo
sw: software to test the decompressor on IBM CAPI platform
Doc: documents for the decompressor
(if you want to use the decompressor on other platform, only files in user_ip and source are needed)
If you use the decompression software from Google, the perfromance of this decompression maybe bad for some special data with extremly high data dependency. In this case, it is recommended to use a modified compression software: https://github.com/ChenJianyunp/snappy-c
In this version, the compression algerithm is slightly changed, but the compression result is still in standard Snappy format. And it will cause almost no change on the compression ratio, while greatly reduce the data dependency and make the parallel decompression more efficient.
Currently, this decompressor pass the test building on ADM-9V3 FPGA card (FPGA: XCVU3P-2 - FFVC1517) in a clock speed of 250MHz. Please choose the the following place and route strategy:
place strategy: Congestion_SpreadLogic_medium
route: strategy: AlternateCLBRouting
On default strategy, the timing constrain may fail due to congestion
- A work-in-paper is accepted in CODES+ISSS 2018, see: https://ieeexplore.ieee.org/document/8525953 \
- A regular paper "Refine and Recycle: A Method to Increase Decompression Parallelism" is accepted in ASAP 2019, see: https://ieeexplore.ieee.org/document/8825015/ \
- A journal paper extended this work to a multi-engine instance: "An efficient high-throughput LZ77-based decompressor in reconfigurable logic", see: https://link.springer.com/article/10.1007/s11265-020-01547-w
If you have some questions or recommendations for this project, please contact Jianyu Chen([email protected]) or Jian Fang([email protected])
| Jianyu Chen | 18-11-2018: Fix a bug on the length of garbage_cnt
| Jianyu Chen | 25-11-2018: Fix a bug of overflow on page_fifo
| Jianyu Chen | 26-11-2018: Fix a bug loss the last slice
| Jian Fang | 22-01-2019: Fix a bug in handshake of AXI protocol (1.read/write length; 2.FSM for input that less than 4KB)
| Jian Fang | 01-02-2019: Fix a bug on app_ready signal
| Jian Fang | 01-02-2019: Fix a bug on write responses(bresp signal, need to wait until the last bresp back for the write data)
| Jianyu Chen | 04-02-2019: Fix the bug in the calculation of token length of 3Byte literal token
| Jianyu Chen | 05-02-2019: Fix a bug when calculate the length of a literal token within a slice
| Jianyu Chen | 05-02-2019: Fix the bug in checking the empty of decompressor when the file is very small
| Jian Fang | 05-03-2019: Fix the bug of input/output of 'data_in' and 'wr_en' signals in the parser fifo (both lit and copy)
| Jianyu Chen | 28-03-2019: Fix the bug on the wrong literature size on parser
| Jianyu Chen | 14-04-2019: Fix the bug in passing the wrong NUM_PARSER to control module
| Jianyu Chen | 23-04-2019: Fix a bug: the length of lit_length is too short
| Jian Fang | 14-05-2019: Add decompressor wrapper for reusing this block in other designs