-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding errors when using a dictionary using zstdmt #944
Comments
We would need a reproduction case to investigate. |
I'd like to, but the files I tried it on is my personal mailbox, so you can imagine that I do not want to share that. I could not reproduce it on maildirs from public mailing lists such as debian-devel. However, I did notice that it is always the largest files that are corrupted. Out of ~1500 files, 1000 of them are between 1 and 10 kilobyte in size, then 400 of them are between 10 and 100 kilobyte in size, and then there's the rest which goes up to 25 megabytes in size. The debian-devel mailing list doesn't have such a distribution, the largest file there is only 90 kilobytes. |
I've been testing again this setup with an extended test set containing large files (> 10 MB), Correction : I was testing under I would recommend testing the In the meantime, I will also put a warning notice on |
I've managed to create a testcase I feel comfortable in sharing, that reproduces the issue 100% of the time on my machine. SInce it's too large to attach to a GitHub issue, I've temporarily made it available here: http://tinc-vpn.org/temp/zstd-testcase.tar.xz Note, I get this issue on a CPU with 6 cores, 12 threads. |
Thanks for the reproduction case @gsliepen. In the meantime, if you urgently need a quick fix, you could select one of these options :
|
Thanks for testing @mestia! We believe we figured out the issue, but hadn't posted here yet. Commit fc8d293 merely hides the issue when compressing a single file, but it remains when compressing multiple files. When compressing with multiple threads and with a dictionary, we can confuse the window size, and use a larger window size for the second chunk than the first. Since the window size of the first chunk is what is written in the frame, the second chunk can have too-large offsets. The If you have corrupted data that you need to recover you can interpret the frame header using the zstd format, change the window size to be larger, then you should be able to decompress the data successfully. You can verify you got the header right with |
We are considering creating a new release of |
I can confirm that that fixed the issue. Thanks!
…--
Met vriendelijke groet / with kind regards,
Guus Sliepen <[email protected]>
|
Dear zstd team,
There seem to be a problem with zstd decoding as described in this bug:
#883816
I see the same problem on my system.
thank you!
The text was updated successfully, but these errors were encountered: