Add support for zstd-compressed corpora #1781

danielmitterdorfer · 2023-09-21T12:47:56Z

Rally supports various compression formats such as gz or bzip. It does not support the zstd format which is perfoming significantly better in disk usage and decompression speed in my experiments. I've compressed 183GB corpus with pbzip2 and pzstd, both with the maximum compression level that is supported by the respective tool.

Format	Size on disk [GB]	Size on disk [GB]	Relative size [%]
bzip	18613471805	18	100
zstd	11215205385	11	60

Also decompression speed is vastly superior (times measured with time, table contains the output of real, i.e. wall clock time):

Format	Time to decompress [s]	Relative time [%]
bzip	388	100
zstd	144	36

Therefore I propose to add support for zstd compression to Rally similar to bzip support: The fast option would require pzstd to be on PATH and a fallback can be based on the Python zstd implementation.

For reference:

Compress data: pzstd -19 corpus.json -o corpus.json.zstd (19 denotes the maximum compression level)
Decompress data: pzstd -d corpus.json.zstd -o corpus.json

The text was updated successfully, but these errors were encountered:

With this commit we add support for zstd compressed corpora. Compared to bzip, the zstd format produces compressed files that are roughly 40% smaller and took around a third of the time to decompress in our tests. Closes elastic#1781

With this commit we add support for zstd compressed corpora. Compared to bzip, the zstd format produces compressed files that are roughly 40% smaller and took around a third of the time to decompress in our tests. Closes #1781

danielmitterdorfer added enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like labels Sep 21, 2023

danielmitterdorfer mentioned this issue Sep 27, 2023

Add support for zstd-compression #1786

Merged

danielmitterdorfer closed this as completed in #1786 Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for zstd-compressed corpora #1781

Add support for zstd-compressed corpora #1781

danielmitterdorfer commented Sep 21, 2023

Add support for zstd-compressed corpora #1781

Add support for zstd-compressed corpora #1781

Comments

danielmitterdorfer commented Sep 21, 2023