-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support zstd compression (port of Allan Judes patch from FreeBSD) #8044
Conversation
@gmelikov looks like the remaining test fails i have seen are common and unrelated to the patch. correct? i have seen the same results in other push requests. so i want to keep the current version as is if no other change requests are required |
@BrainSlayer sometimes there are non-always failing tests, you can check this page for them http://build.zfsonlinux.org/known-issues.html |
@gmelikov : i just reviewed every assertion or fail in the logs and did not found any related bugs in this patch. its just about if my work is done for now or if there is still a task open i have to fullfill. personally i was running alot of tests with copying millions of smaller and bigger files to a storage and back including content verification with zstd enabled |
For reference, this PR is based on https://reviews.freebsd.org/D11124 from FreeBSD. @BrainSlayer I should have been clearer. The requirement isn't strictly to keep the change to one commit, you should absolutely add follow up commits to address bugs or review feedback. But since the CI will build each commit in the branch, and test the top most, we'd prefer not to overwhelm the bots by avoided branches with dozens or hundreds of small commits. Once reviewed and approved we'll squash all the commits before final integration. Based solely on the CI results it does look like this PR is ready for a first round of review feedback. To help the reviewers, can you provide some additional information.
|
your assumption is correct. i picked the last version i found as reference for my port which can be found at https://reviews.freebsd.org/D11124?id=45344. the code is not much different from the original freebsd patch, but i needed to adjust the patch to fit into zfsonlinux which seem to use a very different codebase for zfs. for instance the function arc_cksum_is_equal does not exist in zfsonlinux. so i adjusted basicly all occurences of zio_compress_data in zfs to fit the new api which was introduced with this patch in addition i fixed issues i found while reviewing and running the zfs test suite. ztest may write bogus compression algorithm values which triggers assertions in zfs using this patch. so changed the code a little bit to handle such bogus values. these changes still can be see in the incremental patch set, but i was requested to merge them to a single patch (btw. it wasnt you who requested this explicit but another guy did. but you just helped me to handle it) outstanding bugs: future work may happen if i see potential for enhancements like newer zstd versions with even better performance or better compression and of course i will track the progress at freebsd side. testing. yes i did not add any test suites, but there was a reason for. all other algorithms have also just basic or no tests. but i can adapt the some of the original tests to use zstd explicit instead of lz4 . the perf test looks promising here. i can do this till tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update get_compstring() in dbufstat.py
@richardelling done. thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've given this a first look and included some feedback and questions.
095b810
to
cd01b13
Compare
since github had technical issues tonight, most of the tests are broken since the test scripts failed to download |
@behlendorf are there any further change requests? |
@BrainSlayer I'm not going to have time to give this a careful review until next week. But what would be very helpful is if you could include some performance/compression results comparing zstd, to at least lz4. |
@behlendorf i will do a realistic benchmark after i got my new testsetup installed. waiting for some hdds. but consider that allen already posted such a comparisation in his original bsd patch |
@behlendorf i was running several tests today and repeated them multiple times since the result was strange and too good to be true but now i'm sure its correct. zstd does indeed outperform lz4 in performance. i copied a single uncompressed tar file with 12 gb size from ram cache to a zvolume ssd drive which took 52 seconds for lz4, but just 46 seconds for zstd. this was always reproduceable (zlib took 1 minute and 48 seconds). i assume the lz4 compression in zfsonlinux has a disadvantage since i force to compile the zstd lib codes with -O3 unlike the standard lz4 implementation used by zfs which uses the kernel compile optimization so in any way. the result is really impressive. i also copied the content of the tar file which has about 750k files in it. same result. zstd was faster. so with current implementation zstd does outperform even lz4 in performance and compression ratio. it may be that zstd scales better on high performance cpus for some reason which could also be the cause of the result. since this tar file contains alot of c source files lz4 was also very effective with compressing. it has a ratio of 2.07. zstd reaches a ratio of 2.61 i also tested zstd with deduplication to verify your comment that there might be a problem with it. but there is not. deduplication works with zstd without any issue here the dump of one of these test runs zpool destroy lz4test real 0m45.093s zpool destroy lz4test real 0m53.159s zfs get all|grep compres ah any by the way. the result without compression real 1m3.586s and if i store incompressible files (gziped tar file) there is no measureable difference between lz4 and zstd. performance is identical |
@BrainSlayer by any chance did you measure the time for copying files out of the lz4 and zstd volumes? one of the big value adds for lz4 is that the decompression is so fast, just wondering how zstd measured up. I know the standalone version can use as many CPU cores as directed, which really speeds things up over lz4 |
@greg-hydrogen Splitting up over CPU cores happens automatically in ZFS, since it compresses each record (128kb by default) separately, and cores-1 of them concurrently. Here are my slides from the ZFS User Conference: https://docs.google.com/presentation/d/1yDhE2CaTfx6i1fqol_YbjHWcJa6xKhVUo-R3RrLVUI4/edit?usp=sharing Generally, zstd decompresses are around 800-900 MB/sec/core, whereas LZ4 is ~ 2500MB/sec/core |
For some of these results, it can just be the fact that higher compression ratios mean you write less data, so it takes less time. zstd tends to out-compress lz4 as low as level negative 3. It almost always comes at a higher cost, but if you have spare CPU cycles, that cost is likely acceptable. On the good side, even at high compression levels like +15, where zstd drops to a speed of 10MB/sec/core, decompression speed stays at nearly 1 GB/sec/core (on a 4ghz processor) zstd is always a better option than gzip. With the new negative levels, it can even compete with LZ4 for speed. |
@allanjude @behlendorf the reason why it outperforms lz4 is mainly the limited write speed on the physical layer in my case. even if its a pcie ssd i used it seems that this device is still limited to about 250 mb/s writing speed for some reason. but i configured 2 of these cards to a zfs stripe and even then zstd was still faster. it was a 3 ghz 4 core xeon E3-1220 by the way. here are the results for reading. in that configuration lz4 is slightly faster. but the difference is very thin. lz4: zstd: real 1m13.753s |
now done. used your fork and fixed the fork and squashed the changed i
had todo on top.
something was incomplete in it. maybe you missed some commits. (looks
like 2 for me)
Am 16.06.2019 um 14:53 schrieb Michael Niewöhner:
…
the current version here is state of the art. wat?! state of
the art? you must be kidding.....
yes. its in sync with master. so state of art per definition
No. "State of the art" is not "in sync with master" but "rebased onto
master" ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8044>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2WNE4IIGZVXWHKJQESG43P2YZVJANCNFSM4F567TBQ>.
|
anyway. its now fixed and at the same codelevel
now we need to find a solution for the vmem patch. i cannot take it out
of the PR unless its merged to master.
Am 16.06.2019 um 15:09 schrieb Michael Niewöhner:
…
Oops... seems I skipped some patch...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8044>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2WNE3K3MHS27UQRRD7KQTP2Y3P3ANCNFSM4F567TBQ>.
|
You need to rebase the vmem PR, too. So focus on that now until @allanjude updated his branches |
... and you should squash your commits here and find some appropriate description for one or a few commits |
all done up to the vmem patch. so we are at 13 commits now
Am 16.06.2019 um 15:17 schrieb Michael Niewöhner:
…
... and you should squash your commits here and find some appropriate
description for one or a few commits
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8044>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2WNE4L4QKIX6OJCFNGYLTP2Y4O3ANCNFSM4F567TBQ>.
|
done
Am 16.06.2019 um 15:14 schrieb Michael Niewöhner:
…
i cannot take it out of the PR unless its merged to master.
You need to rebase the vmem PR, too. So focus on that now until
@allanjude <https://github.com/allanjude> updated his branches
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8044>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2WNE4URDPYWNV2UGVMXQTP2Y4DLANCNFSM4F567TBQ>.
|
I just realized you have many commits where the commit message does not fit the content... Maybe it would be better to squash everything to exactly one commit... @behlendorf @allanjude what do you say? |
due the heavy upstream changes recently. i merged now all to a single new commit and hopefully i resolved all new conflicts in a correct new |
cmd/dbufstat/dbufstat
Outdated
@@ -343,7 +343,7 @@ def get_compstring(c): | |||
"ZIO_COMPRESS_GZIP_6", "ZIO_COMPRESS_GZIP_7", | |||
"ZIO_COMPRESS_GZIP_8", "ZIO_COMPRESS_GZIP_9", | |||
"ZIO_COMPRESS_ZLE", "ZIO_COMPRESS_LZ4", | |||
"ZIO_COMPRESS_FUNCTION"] | |||
"ZIO_COMPRESS_ZSTD", "ZIO_COMPRESS_FUNCTION"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't new stuff get added at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. ZIO_COMPRESS_FUNCTION is the end definition of the array and must stay at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops. my fault....
include/sys/zio.h
Outdated
|
||
#define ZIO_COMPRESS_DEFAULT ZIO_COMPRESS_OFF | ||
|
||
#define BOOTFS_COMPRESS_VALID(compress) \ | ||
((compress) == ZIO_COMPRESS_LZJB || \ | ||
(compress) == ZIO_COMPRESS_LZ4 || \ | ||
(compress) == ZIO_COMPRESS_ZSTD || \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, move new stuff to the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hat least before ON and OFF. this has been just merged from alan's code 1:1. will change it
include/sys/zio_compress.h
Outdated
@@ -51,15 +51,75 @@ enum zio_compress { | |||
ZIO_COMPRESS_GZIP_9, | |||
ZIO_COMPRESS_ZLE, | |||
ZIO_COMPRESS_LZ4, | |||
ZIO_COMPRESS_ZSTD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once again, move zstd to the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope. ZIO_COMPRESS_FUNCTIONS is the terminator for the enum array. eveything after the last entry will be ignored. so after LZ4 is correct
include/sys/zio_impl.h
Outdated
* ZFS supports three different flavors of compression -- gzip, lzjb, and | ||
* zle. Compression occurs as part of the write pipeline and is performed | ||
* in the ZIO_STAGE_WRITE_BP_INIT stage. | ||
* ZFS supports five different flavors of compression -- gzip, lzjb, lz4, zle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe such fixes should go to a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? its just documentation. and here zstd was added. so its part of the patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it's a fix which is not related to zstd. let's see what the others say. maybe this is really over-particular ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??? it just adds the line for zstd compression support. which is of course part of the patch. by the way. this documentation update was also taken from alans original bsd code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ZFS supports five different flavors of compression -- gzip, lzjb, lz4, zle,
- and zstd. Compression occurs as part of the write pipeline and is
- performed in the ZIO_STAGE_WRITE_BP_INIT stage.
so mentioning the zstd compression support is part of it
lib/libzpool/Makefile.am
Outdated
@@ -12,11 +17,16 @@ AM_CFLAGS += $(NO_UNUSED_BUT_SET_VARIABLE) | |||
# Includes kernel code generate warnings for large stack frames | |||
AM_CFLAGS += $(FRAME_LARGER_THAN) | |||
|
|||
AM_CFLAGS += -DLIB_ZPOOL_BUILD | |||
AM_CFLAGS += -DLIB_ZPOOL_BUILD -Wframe-larger-than=32768 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will conflict with line 13/18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be 2048. this is a requirement for the zstd code. usually it was messed up to by your clean up procedure. in the early versions is had 32768, but later i changed it to 2048 which was a safe value for all architecures i tested. i can remove it, but i may raise up warnings depending on the archticture and compiler version. default seem to be 4096 now. according to the configure script. so usually its obsolete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, what I meant was that you will have frame-larger-than be set twice and thus overwritten regardless of $(FRAME_LARGER_THAN)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have seen it. i removed this line in latest code now
b18b51c
to
ecbfd14
Compare
now github fucked up everything and closed the PR. man that kills me |
Try |
You pushed without your commit... that's why gh closed this PR |
reflog shows a mess of changes. are you able to help here? otherwise i have to reopen the request with a new one |
my fork still has all changed correct. so i dont know to fix this here |
I mentioned |
This Patch adds zstd compression support zo ZFS
Note:
this is a rework of the original pull request to fullfill the requirement of only offering a single patch. unfortunatly it seems that i have to spend alot of work every day to maintain this pull requests since it gets in conflict with the upstream master quickly so it will evolve quickly and will change usually every day.
Signed-off-by: Sebastian Gottschall [email protected]
Motivation and Context
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.