-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XZ compression support #406
Comments
Really? Can you please provide some references to the last statement? bzip2, lzma and lzo (very fast light compression/decompression, also of data already compressed) are implemented in the zfs-fuse. THey already assigned some identifiers for this 3 compression schemes (actually more as they also have things like bzip2-9, bzip2-8, ... bzip2-1, lzma-1, lzma-2, lzma-3, ..., etc), giving about 25 compression methods (compression attribute values), but still only 3 formats on disks, as they are all readable regardless of compression level by same code (or nacassary informations are in actually compressed stream/data). LZO is very usefull for general usage (it is faster than gzip, and most produces smaller files, or only slightly bigger than gzip at standard level), and bzip2 is usefull for long term archiving and backups. I think lzo, lzma, and xz, bzip2 all can be implemented quite easly in zfsonlinux, as all of them already have kernel implementations. Right? |
AFAIK, XZ needs substantial amounts of memory when compressing/decompressing. For this reason, I'm not convinced that using it inside ZFS is a very good idea. |
Notwithstanding whether XZ is worthwhile, there are two reasons why it is unlikely that it will be added to ZoL:
This request would be better sent to Oracle. Get them to approve it, and then get them to release the latest ZFS source code. |
I was wondering if this could be worth it since this issue is still open, so I did a bit of research and try to provide an opinion. LZMA or XZ gives 50% more compression than LZ4 on a linux kernel 3.3 tarball at the expense of 24X more compression time, 18X more decompression time, but less memory required for compression (1.4X) & decompression (13X). The kernel has 2.81X compression ratio with LZ4. Tests on datablocks would be needed to know if those figures are still holding the road for a ZFS implantation. It would be worthwhile to take a few pools with data, check LZ4 compression ratio and calculate saved space and multiply that by 2. But on most of my pools, I have compression ratio of 1.00 (Media) to 1.42 (rpool). On the article on ServeTheHome, they get 1.93X compression on a VM. That means 14.3Gb compressed of 32Gb uncompressed VM. If we where to use XZ, we could estimate the best case scenario at twice better compression: dataset would drop to 7.15Gb. Now the big questions : Also L2Arc latency impacts needs to be taken in consideration. Usefulness : archiving compressible data (root,VMs, logs), rarely used data or latency irrelevant data. Priority : after thinking about it, I find that there are more urgent features to be added and LZ4 does a good job, perhaps it could be better but the efforts needed don't seem to be worth it for now. (in my view) Sources : |
ZFS would benefit greatly from the addition of the LZMA/LZMA2 compression format (however it doesn't make much sense to drag along the XZ container format too if that is easily avoided). This is something that should be done upstream with all OpenZFS projects, not particularly in the ZFSonLinux project alone. We should not break compatibility with the other OpenZFS implementations. I disagree that this is a feature that Oracle needs to have. OpenZFS pool version 5000 already diverges greatly from Oracle ZFS and it is already assumed by everyone that requires cross compatibility to use pool version 28 or older. There is no reason I can see why this couldn't be added as a feature flag for use by those that wanted it. |
@rmarder I think the Oracle suggestion was made in a different time, when people were still hoping the source faucet would reopen. As it stands, the incompatibility ship has long sailed, with feature flags and Oracle incrementing their on-disk format. As far as LZMA/LZMA2 support, I think you'd be welcome to submit it, but currently #3908 #5927 and #6247 , maybe #7560 , seem like where people are spending effort, since they're varying states of mergeable. (Also, #5846 means gzip isn't necessarily going to peg your CPU for people who have that use case.) |
Considering we just merged ZSTD which has about the same ratio's but significantly faster... |
xz still seems to compress smaller than zstd, so IMO there would still be benefit to having it. |
@gordan-bobic At the same speed it's the same or slightly worse in nearly all benchmarks. |
@Ornias1993 I think the point being stated here is that, for infrequently accessed archival data where maximum possible compression ratio is desirable and compression/decompression speed is effectively immaterial, the highest end of xz still beats out what you can get from the highest end of zstd. And so there are use cases where "worse at the same speed" isn't really meaningful, and the higher attainable compression ratio is meaningful; and so it still provides utility for those situations. (Asterisk: This is based on generalized corpus benchmark data, and I don't know for certain that top end ratios of xz still beat those of zstd when we're talking about relatively small chunks limited to Too niche a use case to bother caring about? Maybe. I dunno. But there is a use case there nevertheless. |
Maximum levels? Any idea how many levels of ZSTD are possible? :P At the same speed XZ is at the very least comparable to ZSTD. |
Even so, considering we spend years trying to even get a thorough review on zstd and no one with any knowhow in the field is even slightly interested in XZ support, I think i'll end this with: Isn't going to be developed anyway for just a niche usecase that might or might not exist considering the effort required and general disinterest by the zfs maintainers for compression algorithms... |
@Ornias1993 Hey now, I was clear and upfront about it likely being a niche use case and so therefore likely not justifying development. So I don't think we're in disagreement on that. I do still think that the technical point of xz having higher best-case ratios holds; yeah the difference may not be giant, but it's there. And so it is potentially of utility, as a general statement, even if overall it's ultimately not worthwhile from a development work point of view. 🤷♂️ (Incidentally, the tables I'd referenced for ratio comparisons are the same ones you linked!) |
According to those charts, at the high compression end, xz is both faster and compresses smaller. |
In `Pool::read_block()`, we check in the zettacache then in the object cache, and then get the object from the object store. This is an inversion of the cache hierarchy, because the zettacache (disk) is slower than the object cache (memory), which can lead to surprising poor performance if a block is present in both the zettacache and object cache. This problem happens commonly if sibling block ingestion is enabled, e.g. during sequential read with a cold zettacache. On the read of the first block of a given object, the object will be fetched from S3 and added to the object cache, and all its blocks added to the zettacache. Soon thereafter, the subsequent blocks in the object are read, which are in both the zettacache and the object cache. This commit changes the behavior such that `Pool::read_block()` first checks if the block is already in the object cache or a GetObject is in progress, and returns this data. Then we check the zettacache and finally get the object from S3 (via the object cache, but it's likely not be there since we just checked a moment ago). This has a big performance win for sibling block ingestion on certain workloads, and is also better behavior for the default of single block ingestion. Bonus change: Now that we do write aggregation, it's rare to fill up the ingestion buffer. If sibling block ingestion is enabled, and we have several concurrent read_block()'s in the same object (which is common), the first one will actually perform the GetObject and see `GetMethod::Loaded` and add all the object's blocks to the zettacache. The concurrent read_block's will see `WaitedForInProgressLoad` and try to insert the one block that they are accessing to the zettacache. However, since all the blocks were just inserted, this is very likely to be a no-op. This commit changes it to not try to do this single-block insertion when sibling block ingestion is enabled. Bonus cleanup: Now that ingested buffers are normally copied to an aligned, aggregate buffer in `Disk::aggregating_writer_thread()`, there's no need to have the data in `AlignedBytes` beforehand. Therefore, `ZettaCache::insert()` can take the `Bytes` directly rather than using a callback to copy to an `AlignedBytes` only if we're actually inserting it. Instead we always provide a `Bytes`, which is cheap to clone (it just bumps a refcount).
The first charts show xz levels 4 through 9 beating zstd 19's compression ratio with only xz level 4 beating zstd's compression speed, but the margins of victory are fairly small. The second charts show xz levels 5 through 9 beating zstd 19's compression ratio with only xz level 5 beating its speed, again with small margins of victory. When I first saw this, I was concerned about memory usage. Since we now support zstd, I suppose this should be reconsidered: https://github.com/facebook/zstd/blob/ff6350c098300ee3050dc6a6cdc0f48032755e84/lib/compress/zstd_compress.c#L4081 At zstd -19, the window log is 8MB, the chain log is 16MB and the hashlog is 4MB, for a total of 28MB used for compression. Decompression uses less (although I do not know how much less offhand). On the other hand, xz's man page says that xz level 4 uses 48MB of RAM while xz level 9 uses 674MB of RAM. This is for compression. My feeling is that this is excessive, even with more modern machines. ZFS should be able to work on lower memory systems (e.g. 256MB of RAM), but supporting the highest xz levels would cause us to run out of memory and hang on low memory systems. My opinion is that the memory requirements of XZ are too high for use in a filesystem. If we were to limit it to xz levels 1 through 3 to keep it within the realm of zstd's memory usage, then its memory usage would be okay, but then xz loses its advantage over zstd, which defeats the purpose of implementing it. |
@rmarder bridfly touched on this when he mentioned that the xz container format is unnecessary, but to expand on that, if we were to implement this, we would not want the lzma2 container format either. I feel that needs to be said since the xz container format encapsulates the lzma2 container format. Just dropping the xz container format is therefore not enough. Another thing that occurs to me is that the additional container formats waste space not just from multiple layers of headers, but also from included padding and checksums. The compression and decompression is likely also somewhat slowed down by the checksums, which are unnecessary in a hypothetical ZFS implementation since ZFS has its own checksums. The lzip developer demonstrated the expense of the checksums when he addressed why busybox unxz is faster than his implementation: https://www.nongnu.org/lzip/lzip_benchmark.html#busybox That said, those checksums probably should be disabled for a “fair” evaluation of the merits of lzma against zstd. I still suspect that it would not be able to perform well enough to justify its inclusion if memory requirements were restricted to roughly what zstd uses. |
I don't believe we should consider compression memory usage to be a problem. There are already other optional feature flags in ZFS (ex: dedup) that suffer from a similar problem of heavy memory usage when enabled. Furthermore, there are multiple ways to ensure there is sufficient memory on the system for the chosen compression level, if that is wanted. A lazy approach would be to simply skip compression if the system resources to do it aren't available (ZFS already silently skips compression in certain conditions). Now, decompression memory usage is a serious concern. If there isn't enough system memory for decompression, there is very little we can do to work around that problem. |
We already have instances where high (9 and up) levels of zstd breaks down the complete zfs system, due to excessive memory consumption. When developing zstd support everything above 9 was considered not-feasable and nothing more than a tech demo. One also needs to take into account that small gains in a compression test will not reflect the same on zfs. The same way ratio’s and speeds of stock zstd are not the same as zstd-on-zfs. So there might not even be any gain at all from these algos and its not even worthwhile discussing until someone proves(!) with a PoC these can actually reach beter speeds or ratio’s when integrated in the zfs stack. Fir zstd we could guess this, because even 50% of the performance would outperform the other compression algos in zfs. But with these margins, this needs a PoC. So yes: I basically call bullshit on the performance gain guesses. |
The difference between the memory usage of deduplication and the memory usage of compression is that deduplication just becomes slower from ARC not being able to provide enough memory while compression will literally hang the system if it does not have enough memory. That is why we have been so conservative when adding new high compression ratio algorithms. zstd was only added in part because it was just so good that others did not feel justified in saying no, but as @Ornias1993 pointed out, it was accepted in a way that allows it to deadlock certain system configurations from excessive memory use. Had I been active at the time, I probably would have requested that the higher levels remain unimplemented out of concern that they would cause deadlocks on low memory systems. The reports that zstd's higher memory usage configurations have caused deadlocks hurt the prospects of lzma, since those deadlocks are no longer merely a theoretical concern and lzma wants even more memory than zstd at the levels at which it has a slight edge. Also, if we were to push zstd to those memory levels by tweaking the configuration rather than using the presets that the authors used, I suspect that it would outperform lzma.
I had suspected this myself, but I was not involved with the development of it at the time, so I had assumed others had already considered this. On the bright side, a quick look suggests that this will not happen on most low memory systems since they also have low core counts, which limits the number of simultaneous threads. A good example would be the Raspberry Pi. We probably could fix the deadlocks by limiting the number of IO threads that may simultaneously perform compression when the system memory is too low. The way it would work would be to keep track of how much memory each compression operation may use and set an upper limit. Then maintain a variable that is "memory available for compression" that will allow IO threads to grab chunks of it. If there is not enough memory available, the IO thread would then have to cv_wait() on more becoming available. An additional variable could be used to keep track of the number of threads that have memory allocations. If it is 0 at the time of the allocation, then we could allow compression to proceed to avoid deadlocks from someone accidentally setting that variable too low. A third variable could be added to allow this to be tunable to allow greater numberes of threads (at the user's risk). A fourth variable that is calculated at module initialization time could be added to disable this behavior entirely on large memory systems. One downside of this mechanism is that if a system enters a deadlock state, it does not provide a way for a system administrator to unstick the deadlock (unless we have some way to have the kernel module parameter change do a Is there an open issue for this?
A good comparison for the sake of ZFS would involve doing compression in recordsize-sized blocks to to evaluate compression performance.
Agreed.
I am a little more optimistic than you, but my conclusion is that even if it does perform slightly better in both compression time and compression ratio, it is not enough to matter. The memory issue is just too big where it does better. Furthermore, decompression performance tends to matter more than compression performance and the decompression performance of lzma is terrible. That said, I do not mean to berate lzma (as I have long been a fan of it), but I just feel that it is not suitable for use in a filesystem for the reasons I have stated. |
This is an excellent point. It turns out that both zstd and xz support doing compression in blocks. It is intended to be used in conjunction with multithreading, since breaking the input stream into blocks that are independently compressed is very amendable to multithreading. Coincidentally, doing multithreading across multiple blocks is similar to what happens inside ZFS due to the IO threads. The main dissimilarity would be that ZFS pads the compressed blocks to multiples of 4K (for ashift=12), while this test does not support that. Anyway, I decided to use the Linux kernel to get some quick data on my Ryzen 7 5800X. I ran these commands to get some comparison data for our default
In summary, zstd outperformed xz in ways that public benchmark data would not predict. Here is the data showing compressed file size and real time:
To summarize that data, To be fairer to xz, I decided to retest with a 1M block size:
This gave:
xz did better here, but we have Since we support a maximum recordsize of 16M (that nobody likely uses), I decided to re-run the tests against that:
Interestingly, zstd becomes faster here while xz becomes slower. At the same time, compression ratios have improved for both, but much more for xz than zstd. This is not the silensia corpus, but this data does not show xz as favorably as the public benchmark data does. Also, I noticed that my previous remark turned out to be wrong:
When the recordsize is infinite, nothing I could do in terms of giving more memory to Also, these tests have shown me that I should consider using higher levels of zstd compression at my home. I might repeat these tests on the silensia corpus later to get a more fair comparison, but I do not expect much to change in terms of the conclusions. xz is better than zstd at finding opportunities to do compression in large records, but those records just are not used in ZFS, and at the recordsizes that are used in ZFS, zstd is overwhelmingly better (although the ratio of the default compression level is not as good as xz). |
Keep in mind, ZFS is shipping zstd 1.4.5, and 1.5.1 and up changed the settings for various recordsizes and compression levels in ways that can significantly affect the performance, so you might see very different outcomes on ZFS versus the CLI. |
That is a good point. That might partially explain why my estimate of the zstd-19 memory usage varies so much from what Allan Jude reported the ZFS implementation uses: https://openzfs.org/w/images/b/b3/03-OpenZFS_2017_-_ZStandard_in_ZFS.pdf It is possible to lookup the 1.4.5 settings and configure zstd 1.5.1+ to use them for a more fair comparison, but I do not expect things to become better for LZMA. It also is probably worth noting that xz’s multiple headers give it a disadvantage at smaller record sizes in the comparison I did, although I do not expect the disadvantage to be big enough to bridge the gap with zstd. That said, unless LZMA’s memory usage can be lowered to zstd levels while being non-negligibly better in at least some common use case that applies to ZFS users, I do not think any revision to testing methodology would make it compare favorably enough to zstd to merit inclusion (or even the effort to do a proof of concept). |
This is a feature request to add XZ compression to zfsonlinux.
(I believe this would unfortunately require on-disk format change, so I am not sure it's acceptable)
XZ is much superior to bzip2 in every aspect, and even mostly obsoletes gzip.
XZ at the "-1" (very fast) compression preset, is barely 30% slower than gzip at default compression, but produces files as small as or even smaller than bzip2 at default compression (at which bzip is immensely slower).
The text was updated successfully, but these errors were encountered: