Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd early abort #13244

Merged
merged 1 commit into from
May 24, 2022
Merged

zstd early abort #13244

merged 1 commit into from
May 24, 2022

Conversation

rincebrain
Copy link
Contributor

@rincebrain rincebrain commented Mar 22, 2022

Motivation and Context

LZ4 is really fast, especially at giving up on incompressible data.

ZSTD is pretty great at compressing things well, but has no such heuristic.

What if...it did?

Well...

Raspberry Pi 4:
zstd abort Pi WIP

Ryzen 5900X:
zstd go fast 5900X

< 0.3% more space used to go from 2 hours to 15 minutes? Sold.

Description

As the comment block says:

        /*
         * A zstd early abort heuristic.
         *
         * - Zeroth, if this is <= zstd-3, or <= zstd_abort_size (currently 128k),
         *   don't try any of this, just go.
         *   (because experimentally that was a reasonable cutoff for a perf win
         *   with tiny ratio change)
         * - First, we try LZ4 compression, and if it doesn't early abort, we
         *   jump directly to whatever compression level we intended to try.
         * - Second, we try zstd-1 - if that errors out (usually, but not
         *   exclusively, if it would overflow), we give up early.
         *
         *   If it works, instead we go on and compress anyway.
         *
         * Why two passes? LZ4 alone gets you a lot of the way, but on highly
         * compressible data, it was losing up to 8.5% of the compressed
         * savings versus no early abort, and all the zstd-fast levels are
         * worse indications on their own than LZ4, and don't improve the LZ4
         * pass noticably if stacked like this.
         */

A graphic of how much space it takes up versus stock zstd-3 if you use one or two passes of this form:
dang it 2

I have...so many spreadsheets of data trying different permutations of this on different hardware.

What's left to do?

Opening the PR to get people's opinions on whether this is worth including, whether they'd like, say, a property on datasets or a module parameter to turn it on and off like I have here, or various other things, before I try to seriously pitch getting it included.

Getting it included would obviously include reordering the commits to, say, not combine some of the code changes and moving lz4 into zcommon, removing the debug tunables, refactoring the zstd compression calls to not be a copy-paste...you get the idea.

Footnote: This trick probably works fine for gzip too.

How Has This Been Tested?

I've run this through its paces on my Pi 4, my Ryzen 5900X, and my 8700k. (I wanted to try it on Linux/x86, but it turns out OpenZFS is too broken for that to work with unmodified git, let alone any changes.)

I've run through 2 datasets for testing, primarily - the low compression dataset I used, around 45 GB uncompressed, with a compression ratio of 1.05 or so as a baseline, made up of ZIP files of firmware images (copied to two separate datasets with recordsize=128k and recordsize=1M configured), and a high compression dataset (ratio of 2.25 or so), 55 GB or so uncompressed, that's a snapshot of my maildir, again in 128k and 1M format.

I ran b3sum against all the files on the different platforms and compared to the reference, and have so far found no discrepancies, so I don't appear to be secretly mangling bytes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@gmelikov
Copy link
Member

Sorry for offtopic - could someone point me at "lz4 early abort mechanism"? All I've found is ZFS's 12.5% threshold, which will be passed through into any compression algo.

@rincebrain
Copy link
Contributor Author

rincebrain commented Mar 23, 2022

Not really offtopic, IMO.

Technically, what I'm proposing is not what LZ4 does - the only thing LZ4 has, AFAIK, other than the normal "oops you hit the buffer size we passed in, give up" is giving up on hunks of data every N if it hasn't found anything compressible in it, not giving up on the whole block (at least, I don't think the constants are large enough for that in most cases...)

(I did find someone's LZ4 implementation that really did have such a mechanism implemented, but it seemed to be hard commented out.)

In our code, look for references to "SKIPSTRENGTH" - e.g.

int findMatchAttempts = (1U << skipStrength) + 3;

Or the more informative prior definition:

zfs/module/zfs/lz4_zfs.c

Lines 182 to 191 in 6b444cb

/*
* NOTCOMPRESSIBLE_CONFIRMATION: Decreasing this value will make the
* algorithm skip faster data segments considered "incompressible".
* This may decrease compression ratio dramatically, but will be
* faster on incompressible data. Increasing this value will make
* the algorithm search more before declaring a segment "incompressible".
* This could improve compression a bit, but will be slower on
* incompressible data. The default value (6) is recommended.
*/
#define NOTCOMPRESSIBLE_CONFIRMATION 6

In modern LZ4, the definition to look for references to is "LZ4_skipTrigger", albeit much less informative:

https://github.com/lz4/lz4/blob/90d68e37093d815e7ea06b0ee3c168cccffc84b8/lib/lz4.c#L647

(I did a couple of experiments changing that value. I don't recommend it - the changes in perf/compratio are absolutely not worth it.)

@behlendorf behlendorf added the Status: Design Review Needed Architecture or design is under discussion label Mar 23, 2022
@adamdmoss
Copy link
Contributor

Looks great. I can't easily think of a downside. Even the code changes are pretty minimal.

@beren12
Copy link
Contributor

beren12 commented Mar 26, 2022

This looks great btw.

@PrivatePuffin
Copy link
Contributor

Going over the test data, I approve of this.
Currently the go-to standard (default) is either ZSTD-1 or ZSTD-3 (very close tie), with these changes, it allows users to safely go to ZSTD-5 without much negative effect on low-compression workloads.

It's basically the no-comp equivalent of how the current ZSTD implementation already handles cant-alloc cases, so it fits within the design view.

@adamdmoss
Copy link
Contributor

I've been using this for about a month now with no ill effects, FWIW. Seems to work great.

@adamdmoss
Copy link
Contributor

Mind rebasing? Doesn't apply to current trunk.

@rincebrain
Copy link
Contributor Author

rincebrain commented Apr 23, 2022

There, now with many of the knobs removed.

I'll probably remove the rest shortly, except the "on/off", since I was told that making it a dataset property was unreasonable.

I have various thoughts on additional refinements, but I'm not particularly interested in burning weeks of benchmarking time to see if they minorly improve things. I'll post a long followup of more data and un-WIP this tomorrow, probably.

@rincebrain
Copy link
Contributor Author

Or the rebase could cause memory corruption on older kernels despite not modifying the allocation codepaths (or even directly calling them outside of the existing function implementations). That's neat. I wonder what on earth is going on.

Time to break out the KASAN I guess.

@rincebrain
Copy link
Contributor Author

Disabling compressed ARC and trying to compress things has broken handling, resulting in sometimes trying to call zio_compress_data here claiming the input buffer is 128k long when it's actually 512b of data, and we allocated 512b.

I tried to guess how it was intended to work, and it at least passes the test and doesn't complain on my KASAN kernel, but I am not remotely confident that I'm certain what is necessary in all cases. So I may just go back to my first solution of "not if you have compressed ARC enabled" unless someone debugs it if it's not fixed everywhere.

@rincebrain
Copy link
Contributor Author

Anyway, the addendum I was planning to post today, before the above excitement debugging, which hopefully covers the remaining concerns people voiced.

So, at the last leadership meeting, it was requested that I showed off how much more expensive it could be in the worst case, if we ran through all these preprocessing passes and still decided to do the compression.

So, I did it, on the various test datasets I was using before. (“alwayson” is the mode in which it unconditionally runs all passes on anything sufficiently large, which is still configured to 128k minimum as before.)

(I'm sorry one or two of the graphs still call the mode for "run all compression passes unconditionally" "hardmoed", which is what I referenced it as in the code and used initially, but I only noticed it now when posting this. So it's the same thing, I just missed updating it in one or two templates, apparently.)

The breakdown of compressibility stats goes:
passignored means it was excluded from any EA processing, either from the record being too small or the compression level requested being too low
lz4pass_allowed means “LZ4 compressed it successfully, we stopped further EA processing and went straight to the heavy compression”
zstdpass_allowed means, similarly, “LZ4 rejected it, but zstd-1 compressed it, so we go on to compress it” - this is the most expensive path, if we do all the compression steps
zstdpass_rejected means “LZ4 and zstd-1 both refused to compress it, so we’re not even gonna try”
The sum of the zstdpass_allowed/rejected is the number of records LZ4 rejected
An interesting quirk is that if you vary the recordsize, the absolute number of passignored records remains relatively constant in my test datasets, even while the number of records large enough varies drastically.

5900X, low compression, 128k records:
128k lowcomp 5900X
5900X, low comp, 1M records:
1M lowcomp 5900X
5900X, high comp, 128k records:
128k highcomp 5900X
5900X, high comp, 1M records:
1M highcomp 5900X

If you’d like to see something a little more interesting, I made an evil dataset which fails to compress at zstd-1 but succeeds at zstd-2 and above, and tested it with baseline, with a zstd-1 early abort pass, with a zstd-2 early abort pass, and with modes that unconditionally ran all steps for each (which don’t differ much, because surprise, every data block in this dataset is like this - I found a 7M region from my low compressibility dataset which differed between the EA and non-EA runs, found that it compressed only with zstd-2 and above, and made a 42 GB dataset out of copies of that 7 MB.)

1M evil 5900X

Sadly, you can see that using zstd-2 saves you from most of the pathological compression savings behavior, but it also nukes nearly all of the performance benefit. (Not pictured above - I tried constructing custom zstd level settings to have our cake and eat it too, but unfortunately, for at least this case, the same setting both killed the speed and resulted in concluding it was compressible - the maximum window it would look back over.)

But what about something with less unlimited CPU power to burn?

M1 (hw accelerated) VM, low compression, 128k:
128k lowcomp M1VM
M1 VM, low compression, 1M:
1M lowcomp M1VM
And not very interesting, but M1 VM, high compression. 1M:
1M highcomp M1VM

So, worst case is visible, but not earth-shattering. (To be concrete, the largest delta between no preprocessing, EA, and alwayson was 7.9 and 63 seconds, respectively, at zstd-13/zstd-15, which was against a total baseline time of 602/846 seconds.)

Or even less CPU?

Pi 4, low compression, 1M recordsize:
1M lowcomp Pi4 WIP2

That is, for reference, a difference between 835 seconds and 2321 seconds at zstd-15. Obviously, zstd levels above 3 are not really a gain on this particular dataset, but it’s interesting both to see how large the delta becomes, and how huge a percentage are only getting filtered out by the zstd pass here.

@rincebrain rincebrain changed the title RFC: zstd early abort zstd early abort Apr 24, 2022
@rincebrain
Copy link
Contributor Author

There, I've dropped most of the knobs from it, cleaned it up, corrected the style complaints, squashed it to one commit, and am dropping the draft marker.

(It has #13375 baked in because crashing on a known bug doesn't do anyone any good for evaluating this.)

andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
It turns out that "do LZ4 and zstd-1 both fail" is a great heuristic
for "don't even bother trying higher zstd tiers".

By way of illustration:
$ cat /incompress | mbuffer | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_normal
summary: 39.8 GiByte in  3min 40.2sec - average of  185 MiB/s
$ echo 3 | sudo tee /sys/module/zzstd/parameters/zstd_lz4_pass
3
$ cat /incompress | mbuffer -m 4G | zfs recv -o compression=zstd-12 evenfaster/lowcomp_1M_zstd12_patched
summary: 39.8 GiByte in 48.6sec - average of  839 MiB/s
$ sudo zfs list -p -o name,used,lused,ratio evenfaster/lowcomp_1M_zstd12_normal evenfaster/lowcomp_1M_zstd12_patched
NAME                                         USED        LUSED  RATIO
evenfaster/lowcomp_1M_zstd12_normal   39549931520  42721221632   1.08
evenfaster/lowcomp_1M_zstd12_patched  39626399744  42721217536   1.07
$ python3 -c "print(39626399744 - 39549931520)"
76468224
$

I'll take 76 MB out of 42 GB for > 4x speedup.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Kjeld Schouten <[email protected]>
Reviewed-by: Ahelenia Ziemiańska <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#13244
bsdjhb pushed a commit to CTSRD-CHERI/zfs that referenced this pull request Jul 13, 2023
This is not associated with a specific upstream commit but apparently
a local diff applied as part of:

commit e3aa18ad71782a73d3dd9dd3d526bbd2b607ca16
Merge: 645886d028c8 b9d9845
Author: Martin Matuska <[email protected]>
Date:   Fri Jun 3 17:58:39 2022 +0200

    zfs: merge openzfs/zfs@b9d98453f

    Notable upstream pull request merges:
      openzfs#12321 Fix inflated quiesce time caused by lwb_tx during zil_commit()
      openzfs#13244 zstd early abort
      openzfs#13360 Verify BPs as part of spa_load_verify_cb()
      openzfs#13452 More speculative prefetcher improvements
      openzfs#13466 Expose zpool guids through kstats
      openzfs#13476 Refactor Log Size Limit
      openzfs#13484 FreeBSD: libspl: Add locking around statfs globals
      openzfs#13498 Cancel in-progress rebuilds when we finish removal
      openzfs#13499 zed: Take no action on scrub/resilver checksum errors
      openzfs#13513 Remove wrong assertion in log spacemap

    Obtained from:  OpenZFS
    OpenZFS commit: b9d9845
behlendorf added a commit that referenced this pull request Oct 13, 2023
New Features
- Block cloning (#13392)
- Linux container support (#14070, #14097, #12263)
- Scrub error log (#12812, #12355)
- BLAKE3 checksums (#12918)
- Corrective "zfs receive"
- Vdev and zpool user properties

Performance
- Fully adaptive ARC (#14359)
- SHA2 checksums (#13741)
- Edon-R checksums (#13618)
- Zstd early abort (#13244)
- Prefetch improvements (#14603, #14516, #14402, #14243, #13452)
- General optimization (#14121, #14123, #14039, #13680, #13613,
  #13606, #13576, #13553, #12789, #14925, #14948)

Signed-off-by: Brian Behlendorf <[email protected]>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The man page correctly notes the earlyabort heuristic runs at zstd levels >= 3 by default but the code comment for zfs_zstd_compress_wrap instead says it does not run at level 3 and is confusing
<= should be <

* - Zeroth, if this is <= zstd-3, or < zstd_abort_size (currently 128k), don't try any of this, just go. (because experimentally that was a reasonable cutoff for a perf win with tiny ratio change)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, just something I missed because I was dithering over which version to use. Feel free to open a tiny PR changing it, or I'll try to remember when I get a chance later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. in code documentation is wrong. the code itself starts heuristic at => level 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants