-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
secondarycachecompress #1379
Comments
It will likely be merged soon after someone finds time to port the patches. |
I had a try at porting the l2arc compression patches a while ago. I have been running with this for a couple of months and haven't had any issues - so I guess I didn't make too many mistakes porting it.... |
@jimmyH Nice, thanks for taking a crack at this. However, since these changes haven't even been merged to Illumos in their final form I'd prefer not to rush ahead. Let's let it soak for a while. We could create a pull request with the patches applied for those who want to give this functionality some additional testing. |
3137 L2ARC compression Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@aad0257 https://www.illumos.org/issues/3137 http://wiki.illumos.org/display/illumos/L2ARC+Compression Notes for Linux port: A l2arc_nocompress module option was added to prevent the compression of l2arc buffers regardless of how a dataset's compression property is set. This allows the legacy behavior to be preserved. Ported by: James H <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#1379
@jimmyH I've refreshed the l2arc compression patch against the latest master using what was committed to Illumos. They appear to have dropped the I did add a @skiselkov Any chance you can shed some light on why the |
@behlendorf Hi Brian. The reason we dropped the property is because ultimately we couldn't come up with a reason why you would want to avoid compressing to L2ARC while compressing to the main pool. There's almost no performance penalty in decompression and given that most deployments never ever touch the defaults (especially not the more esoteric ones, like those dealing with caching), we figured it is better to keep it simple. Thanks for pointing out the docs discrepancy, I'll go fix it right away. |
On Thu, 1 Aug 2013, skiselkov wrote:
Metadata. I don't use compression on any of my pools, because I know that But we've read here many times before that the metadata compresses Tim Connors |
I think it makes sense to always compress metadata. This is already the case for the primary pool and we should carry that behavior over for the l2arc where it makes even more sense. We can leverage the existing zfs_mdcomp_disable module option to disable this if needed. But I have a similar use case where the data is incompressible but I'd love to cache as much metadata as possible in the l2arc. |
One issue I have noticed with this is that you do not get as good a compression ratio versus using zram:
|
On 8/2/13 2:18 AM, Brian Behlendorf wrote:
True that, I'll add a modification which will always compress metadata |
On 8/2/13 8:10 AM, jimmyH wrote:
This would seem in line with general LZ4 performance.
I don't understand how you can get any more L2ARC compression when using Cheers,Saso |
Saso, On machines without SSDs, I find it best to restrict the arc cache and use the free memory for a zram disk which uses LZO compression. I then use this as L2ARC for metadata only. zram0 orig=1000751104 compr=170572156 zram1 orig=991719424 compr=170423437 zram2 orig=982605824 compr=167444387 zram3 orig=996671488 compr=171963690 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- zfs1 1.17T 525G 41 68 1.55M 915K vdb1 1.17T 525G 41 68 1.55M 915K cache - - - - - - zram0 1.99G 8M 3 0 40.7K 21.8K zram1 1.99G 8M 3 0 36.6K 22.1K zram2 1.99G 8M 3 0 47.6K 17.9K zram3 1.99G 8M 2 0 34.3K 17.8K ---------- ----- ----- ----- ----- ----- ----- l2_size 4 4694123008 This is an extreme case, where zram is reporting that the L2ARC disks are really only 50% full before compression, and we are getting ~6x compression on that data. On a different machine this is what I see from just using secondarycachecompress: ----------------------------------- ----- ----- ----- ----- ----- ----- zfs2 1.79T 709G 43 7 1.13M 318K dm-name-grandadibu-zfs2 1.79T 709G 43 7 1.13M 318K cache - - - - - - nbd5 3.89G 8M 8 0 26.3K 6.86K nbd6 3.89G 8M 8 0 24.9K 9.49K ----------------------------------- ----- ----- ----- ----- ----- ----- l2_size 4 13368855040 l2_asize 4 8122558976 l2_hdr_size 4 330429216 Perhaps linux is misreporting the zram usage (I will run some tests to see if this is true). One test I will run is take a snapshot of the cache device whilst it is running with secondarycachecompress and then compress it using LZO. Would be worth while someone running the same test on illumos. |
I took one of the 4GB L2ARC partitions where secondarycachecompress gives a ~1.65x compression ratio and observed: | lz4 (zfs filesystem with compression=lz4) | 4GB | 1x | | lz4c (default settings) | 611MB | 6.7x | | lzo (default settings) | 596MB | 6.9x | | gzip -1 | 516MB | 7.9x | So I guess this is an issue with the settings we are using for lz4. |
On 8/2/13 1:23 PM, jimmyH wrote:
There can hardly be, there are almost no settings to tweak for LZ4, the Cheers,Saso |
On 8/2/13 1:23 PM, jimmyH wrote:
Btw: what was your l2_size/l2_asize for this test? I'm guessing you're Cheers,Saso |
Sorry. A moment of stupidity - this was within an ext4 disk image which is obviously not aware of the zfs compression. | lz4 (zfs filesystem with compression=lz4) | 632MB | 6.5x | So despite the l2arc using secondarycachecompress it can still be compressed. I wonder if the arc fragmentation issues we get in the linux port are also affecting the l2arc? |
I had 2x4GB L2ARC partitions with: |
On 8/2/13 1:54 PM, jimmyH wrote:
This doesn't make any sense to me. According to l2_asize you have Cheers,Saso |
To clarify: The question is why can we get the additional 6.5x compression ratio on data which should already be compressed. |
On 8/2/13 2:18 AM, Brian Behlendorf wrote:
Brian, just wanted to let you know that your suggestion has been |
I think I've understood the reason for the poor compression ratio:
If we did not align the data in the L2ARC we would get a much better compression ratio for small buffers at the expense of slower read/writes. I did a quick hack to modifiy l2arc_write_buffers() and zio_compress() so that small buffers get compressed. This gave me a very good l2_size/l2_asize compression ratio. I am not sure what would need to be changed to the vdev code to allow you to write unaligned blocks (zio_write_phys() returns an error if you try this) |
On 8/4/13 1:29 PM, jimmyH wrote:
How exactly did you modify l2arc_write_buffers? Unaligned writes to block devices are not possible. I'm somewhat Cheers,Saso |
On 04/08/2013 13:38, skiselkov wrote:
The l2arc code could pack the buffers when writing them to the vdev. thanks, Jimmy |
On 8/4/13 3:38 PM, jimmyH wrote:
Are you running the latest upstream where buf_compress_minsz is The modification of zio_compress_data() is not good. Alignment matters.
While it might improve the compression ratio for these corner cases, it |
@skiselkov Thanks for the quick turn around on the meta data compression change. @jimmyH beat me too it but I was going to say the improved zram compression must be due to the vdev alignment constraints. Metadata is quite likely to compress significantly smaller than the device sector size.
Alignment absolutely matters, reverting to read-modify-write for an ssd will hurt performance badly. But keep in mind you're assuming certain performance characteristics for that block device. The backing device could be a normal ram disk, a compressed ram disk like zram, a low latency network block device, or something slightly more exotic like phase change memory. All of these things exist today and could be used to back an l2arc device, we should made sure the l2arc can be tuned to take advantage of them.
This becomes important when we're talking about the metadata-only use case. Here the corner case may easily become the common case and it would be nice to handle that cleanly. Packing the buffers is one solution which I could see working pretty well. I'm a little less concerned about the extra overhead here since I think this could be implemented efficiently, it's certainly something we'd need to keep in mind. For me the bigger question is is this worth doing at all. And I think the answer really depends on what you're storing in the l2arc and what technology you're using to back it. For ssds which are now large, relatively cheap, but slow to write it doesn't make sense. If you're using something like zram then what you probably want is to disable all l2arc compression. The zram device which doesn't have the alignment concerns will be able to do a much better job compressing the data and packing it. For the new types of non-volatile storage arriving in the next few years which will likely be small, expensive, and fast I suspect packing may be worthwhile. |
On 8/4/13 7:00 PM, Brian Behlendorf wrote:
You hit the nail on the head. Of course one can always devise complex More to the specific case here, using L2ARC on ram disks appears to me Don't get me wrong, I'm not against innovation in the ARC (there is |
@skiselkov Totally agree with your comments. I hacked a really basic patch to pack small l2arc buffers - jimmyH@243d508 - which I'll leave soaking on some real servers to see if it is a stupid idea... One comment with regards to the compression patch: is storing b_tmp_cdata in the l2arc_buf_hdr structure sensible? I'd have thought we'd want to keep these headers as small as possible. |
Can you do an analysis on your workload on how much of a performance/efficiency gain you get in running this code vs running upstream source? I'm personally not convinced that it's really worth the hassle, especially using ramdisk-backed L2ARC devices instead of just implementing more sensible ARC partitioning. (To clarify here, I mean a real-world use case that would hit this, not just synthetics.) The reason b_tmp_cdata is in the header there is, as you have no doubt discovered writing that patch, that we need to keep the pointer around until after we're done writing it and the l2arc_hdr_t structure was just the most convenient place to put it. Now I could do it by allocating a separate "freelist" and stuffing it into the callback, but that would detach the compressed data portion of a buffer from the buffer header itself, which would make it harder to track where it is and making sure it is released after a write, plus the cost on overhead is rather small (an arc_hdr_t is around 200 bytes anyway). If we were to integrate the packing method you propose here, it'd probably make sense to start doing something smarter here. |
Workload: backup server running rsync. Where rsync spends most of its time stat'ing millions of files before realising they are all up to date. So backups are mostly limited on random reads of metadata. This server has an old ssd for l2arc, the more metadata we can fit in l2arc the faster the backups.
Now get 8.1x compression ratio, but l2_hdr_size is approximately 1/3 of l2_asize. On its' own I'm also not sure that this is 'worth the hassle' while we have such a large header overhead. I understand why the original zfs developers keep an arc_hdr_t (264 bytes on linux x86_64) for every block in the l2arc, but it looks like a design flaw to me. |
3964 L2ARC should always compress metadata buffers Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> References: https://www.illumos.org/issues/3964 Ported-by: Brian Behlendorf <[email protected]> Closes openzfs#1379
@skiselkov You might want to read the following thread on the lz4 mailing list where one guy managed to compress a 3.5GB file into a 750KB file by compressing it multiple times. https://groups.google.com/forum/#!msg/lz4c/DcN5SgFywwk/AVMOPri0O3gJ Yann Collet provided a good explanation for the observed behavior in his response. |
Well, a curious case for sure, but I'm not convinced it's really applicable to filesystem compression. The cited scale in a the second pass there, going from 3.5GB to 9.2MB is already much greater (368x) than the difference between SPA_MAXBLOCKSIZE and SPA_MINBLOCKSIZE (256x). Nevertheless, there probably is a case to be made for implementing compress="lz4-2" or indeed lz4-n with n={1..9} in a fashion similar to gzip. Even at two compression passes, lz4 would still be almost as fast as lzjb, but probably produce much smaller outputs for certain workloads. But I fear giving too many knobs into the hands of users here, gun-shoot-foot. |
L2ARC compression from Illumos has been merged. Thanks @skiselkov for developing this, we can always pursue further improvements but this syncs us back up with Illumos. 4e59f47 Illumos #3964 L2ARC should always compress metadata buffers |
3137 L2ARC compression Reviewed by: George Wilson <[email protected]> Reviewed by: Matthew Ahrens <[email protected]> Approved by: Dan McDonald <[email protected]> References: illumos/illumos-gate@aad0257 https://www.illumos.org/issues/3137 http://wiki.illumos.org/display/illumos/L2ARC+Compression Notes for Linux port: A l2arc_nocompress module option was added to prevent the compression of l2arc buffers regardless of how a dataset's compression property is set. This allows the legacy behavior to be preserved. Ported by: James H <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#1379
3964 L2ARC should always compress metadata buffers Reviewed by: Matthew Ahrens <[email protected]> Reviewed by: Brian Behlendorf <[email protected]> References: https://www.illumos.org/issues/3964 Ported-by: Brian Behlendorf <[email protected]> Closes openzfs#1379
is there any plan to merge secondarycachecompress into zfsonlinux?
The text was updated successfully, but these errors were encountered: