Add support for zstd-compressed corpora #1781
Labels
enhancement
Improves the status quo
:Track Management
New operations, changes in the track format, track download changes and the like
Rally supports various compression formats such as gz or bzip. It does not support the zstd format which is perfoming significantly better in disk usage and decompression speed in my experiments. I've compressed 183GB corpus with
pbzip2
andpzstd
, both with the maximum compression level that is supported by the respective tool.Also decompression speed is vastly superior (times measured with
time
, table contains the output ofreal
, i.e. wall clock time):Therefore I propose to add support for zstd compression to Rally similar to bzip support: The fast option would require
pzstd
to be onPATH
and a fallback can be based on the Python zstd implementation.For reference:
pzstd -19 corpus.json -o corpus.json.zstd
(19
denotes the maximum compression level)pzstd -d corpus.json.zstd -o corpus.json
The text was updated successfully, but these errors were encountered: