Zstandard v1.4.7
Note : this version features a minor bug, which can be present on systems others than x64
and arm64
. Update v1.4.8
is recommended for all other platforms.
v1.4.7
unleashes several months of improvements across many axis, from performance to various fixes, to new capabilities, of which a few are highlighted below. It’s a recommended upgrade.
(Note: if you ever wondered what happened to v1.4.6
, it’s an internal release number reserved for synchronization with Linux Kernel)
Improved --long
mode
--long
mode makes it possible to analyze vast quantities of data in reasonable time and memory budget. The --long
mode algorithm runs on top of the regular match finder, and both contribute to the final compressed outcome.
However, the fact that these 2 stages were working independently resulted in minor discrepancies at highest compression levels, where the cost of each decision must be carefully monitored. For this reason, in situations where the input is not a good fit for --long
mode (no large repetition at long distance), enabling it could reduce compression performance, even if by very little, compared to not enabling it (at high compression levels). This situation made it more difficult to "just always enable" the --long
mode by default.
This is fixed in this version. For compression levels 16 and up, usage of --long
will now never regress compared to compression without --long
. This property made it possible to ramp up --long
mode contribution to the compression mix, improving its effectiveness.
The compression ratio improvements are most notable when --long
mode is actually useful. In particular, --patch-from
(which implicitly relies on --long
) shows excellent gains from the improvements. We present some brief results here (tested on Macbook Pro 16“, i9).
Since --long
mode is now always beneficial at high compression levels, it’s now automatically enabled for any window size >= 128MB and up.
Faster decompression of small blocks
This release includes optimizations that significantly speed up decompression of small blocks and small data. The decompression speed gains will vary based on the block size according to the table below:
Block Size | Decompression Speed Improvement |
---|---|
1 KB | ~+30% |
2 KB | ~+30% |
4 KB | ~+25% |
8 KB | ~+15% |
16 KB | ~+10% |
32 KB | ~+5% |
These optimizations come from improving the process of reading the block header, and building the Huffman and FSE decoding tables. zstd
’s default block size is 128 KB, and at this block size the time spent decompressing the data dominates the time spent reading the block header and building the decoding tables. But, as blocks become smaller, the cost of reading the block header and building decoding tables becomes more prominent.
CLI improvements
The CLI received several noticeable upgrades with this version.
To begin with, zstd
can accept a new parameter through environment variable, ZSTD_NBTHREADS
. It’s useful when zstd
is called behind an application (tar
, or a python script for example). Also, users which prefer multithreaded compression by default can now set a desired nb of threads with their environment. This setting can still be overridden on demand via command line.
A new command --output-dir-mirror
makes it possible to compress a directory containing subdirectories (typically with -r
command) producing one compressed file per source file, and reproduce the arborescence into a selected destination directory.
There are other various improvements, such as more accurate warning and error messages, full equivalence between conventions --long-command=FILE
and --long-command FILE
, fixed confusion risks between stdin
and user prompt, or between console output and status message, as well as a new short execution summary when processing multiple files, cumulatively contributing to a nicer command line experience.
New experimental features
Shared Thread Pool
By default, each compression context can be set to use a maximum nb of threads.
In complex scenarios, there might be multiple compression contexts, working in parallel, and each using some nb of threads. In such cases, it might be desirable to control the total nb of threads used by all these compression contexts altogether.
This is now possible, by making all these compression contexts share the same threadpool. This capability is expressed thanks to a new advanced compression parameter, ZSTD_CCtx_refThreadPool()
, contributed by @marxin. See its documentation for more details.
Faster Dictionary Compression
This release introduces a new experimental dictionary compression algorithm, applicable to mid-range compression levels, employing strategies such as ZSTD_greedy
, ZSTD_lazy
, and ZSTD_lazy2
. This new algorithm can be triggered by selecting the compression parameter ZSTD_c_enableDedicatedDictSearch
during ZSTD_CDict
creation (experimental section).
Benchmarks show the new algorithm providing significant compression speed gains :
Level | Hot Dict | Cold Dict |
---|---|---|
5 | ~+17% | ~+30% |
6 | ~+12% | ~+45% |
7 | ~+13% | ~+40% |
8 | ~+16% | ~+50% |
9 | ~+19% | ~+65% |
10 | ~+24% | ~+70% |
We hope it will help making mid-levels compression more attractive for dictionary scenarios. See the documentation for more details. Feedback is welcome!
New Sequence Ingestion API
We introduce a new entry point, ZSTD_compressSequences()
, which makes it possible for users to define their own sequences, by whatever mechanism they prefer, and present them to this new entry point, which will generate a single zstd
-compressed frame, based on provided sequences.
So for example, users can now feed to the function an array of externally generated ZSTD_Sequence
:
[(offset: 5, matchLength: 4, litLength: 10), (offset: 7, matchLength: 6, litLength: 3), ...]
and the function will output a zstd compressed frame based on these sequences.
This experimental API has currently several limitations (and its relevant params exist in the “experimental” section). Notably, this API currently ignores any repeat offsets provided, instead always recalculating them on the fly. Additionally, there is no way to forcibly specify existence of certain zstd features, such as RLE or raw blocks.
If you are interested in this new entry point, please refer to zstd.h
for more detailed usage instructions.
Changelog
There are many other features and improvements in this release, and since we can’t highlight them all, they are listed below:
- perf: stronger
--long
mode at high compression levels, by @senhuang42 - perf: stronger
--patch-from
at high compression levels, thanks to--long
improvements - perf: faster decompression speed for small blocks, by @terrelln
- perf: faster dictionary compression at medium compression levels, by @felixhandte
- perf: small speed & memory usage improvements for
ZSTD_compress2()
, by @terrelln - perf: minor generic decompression speed improvements, by @helloguo
- perf: improved fast compression speeds with Visual Studio, by @animalize
- cli : Set nb of threads with environment variable
ZSTD_NBTHREADS
, by @senhuang42 - cli : new
--output-dir-mirror DIR
command, by @xxie24 (#2219) - cli : accept decompressing files with
*.zstd
suffix - cli :
--patch-from
can compressstdin
when used with--stream-size
, by @bimbashrestha (#2206) - cli : provide a condensed summary by default when processing multiple files
- cli : fix :
stdin
input can no longer be confused with user prompt - cli : fix : console output no longer mixes
stdout
and status messages - cli : improve accuracy of several error messages
- api : new sequence ingestion API, by @senhuang42
- api : shared thread pool: control total nb of threads used by multiple compression jobs, by @marxin
- api : new
ZSTD_getDictID_fromCDict()
, by @LuAPi - api : zlibWrapper only uses public API, and is compatible with dynamic library, by @terrelln
- api : fix : multithreaded compression has predictable output even in special cases (see #2327) (issue not present on cli)
- api : fix : dictionary compression correctly respects dictionary compression level (see #2303) (issue not present on cli)
- api : fix : return
dstSize_tooSmall
error whenever appropriate - api : fix :
ZSTD_initCStream_advanced()
with static allocation and no dictionary - build: fix cmake script when employing path including spaces, by @terrelln
- build: new
ZSTD_NO_INTRINSICS
macro to avoid explicit intrinsics - build: new
STATIC_BMI2
macro for compile time detection of BMI2 on MSVC, by @Niadb (#2258) - build: improved compile-time detection of aarch64/neon platforms, by @bsdimp
- build: Fix building on AIX 5.1, by @likema
- build: compile paramgrill with cmake on Windows, requested by @mirh
- build: install pkg-config file with CMake and MinGW, by @tonytheodore (#2183)
- build: Install DLL with CMake on Windows, by @BioDataAnalysis (#2221)
- build: fix : cli compilation with uclibc
- misc: Improve single file library and include dictBuilder, by @cwoffenden
- misc: Fix single file library compilation with Emscripten, by @yoshihitoh (#2227)
- misc: Add freestanding translation script in
contrib/freestanding_lib
, by @terrelln - doc : clarify repcode updates in format specification, by @felixhandte