Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding errors when using a dictionary using zstdmt #944

Closed
mestia opened this issue Dec 8, 2017 · 10 comments
Closed

Encoding errors when using a dictionary using zstdmt #944

mestia opened this issue Dec 8, 2017 · 10 comments
Labels

Comments

@mestia
Copy link

mestia commented Dec 8, 2017

Dear zstd team,
There seem to be a problem with zstd decoding as described in this bug:
#883816
I see the same problem on my system.

When using a pre-made dictionary, zstdmt will generate corrupted files:

[guus@haplo]/dev/shm/corpus>zstd --rm -D ../dictionary *
[guus@haplo]/dev/shm/corpus>zstd --rm -D ../dictionary -d *
[guus@haplo]/dev/shm/corpus>zstdmt --rm -D ../dictionary *
[guus@haplo]/dev/shm/corpus>zstdmt --rm -D ../dictionary -d *
ar,S=18008914:2,.zst : Decoding error (36) : Corrupted block detected
ar,S=6386609:2,S.zst : Decoding error (36) : Corrupted block detected
ar,S=6382007:2,S.zst : Decoding error (36) : Corrupted block detected
[...]
[guus@haplo]/dev/shm/corpus>zstd --rm -D ../dictionary -d *.zst
ar,S=18008914:2,.zst : Decoding error (36) : Corrupted block detected
ar,S=6386609:2,S.zst : Decoding error (36) : Corrupted block detected
ar,S=6382007:2,S.zst : Decoding error (36) : Corrupted block detected
[...]

Not all files are corrupt, only 1% has problems.

thank you!

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 8, 2017

We would need a reproduction case to investigate.
I tried this test setup using multiple test files, but none of them failed so far.
It likely requires specific files to show up.

@gsliepen
Copy link

gsliepen commented Dec 9, 2017

I'd like to, but the files I tried it on is my personal mailbox, so you can imagine that I do not want to share that. I could not reproduce it on maildirs from public mailing lists such as debian-devel. However, I did notice that it is always the largest files that are corrupted. Out of ~1500 files, 1000 of them are between 1 and 10 kilobyte in size, then 400 of them are between 10 and 100 kilobyte in size, and then there's the rest which goes up to 25 megabytes in size. The debian-devel mailing list doesn't have such a distribution, the largest file there is only 90 kilobytes.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 11, 2017

I've been testing again this setup with an extended test set containing large files (> 10 MB),
but none of them failed so far.
I'm afraid I'll need a reproduction case to be able to investigate.

Correction : I was testing under dev branch.
Switching to v1.3.2, like the debian test, I can observe the bug, specifically when using zstdmt compression with dictionary on large files.
It seems the bug is already fixed in dev branch, as I cannot reproduce it using same files and dictionary. It's fairly hard to determine which change exactly helps, but I would typically look in this direction.

I would recommend testing the dev branch on the test set where you observed the issue.

In the meantime, I will also put a warning notice on v1.3.2.

@gsliepen
Copy link

I've managed to create a testcase I feel comfortable in sharing, that reproduces the issue 100% of the time on my machine. SInce it's too large to attach to a GitHub issue, I've temporarily made it available here:

http://tinc-vpn.org/temp/zstd-testcase.tar.xz

Note, I get this issue on a CPU with 6 cores, 12 threads.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 11, 2017

Thanks for the reproduction case @gsliepen.
I could confirm the diagnosis : v1.3.2 fails on a few files (about a dozen, all > 4 MB),
while dev branch version works just fine for all files.
Therefore, a spurious fix seems already integrated, and will be part of next release.

In the meantime, if you urgently need a quick fix, you could select one of these options :

  • if using cli : compress files one by one, since the issue only shows up when multiple files are compressed in the same command.
  • switch to the dev branch, since the bug is no longer triggered there
  • disable multi-threading, since it's only useful for large files (> several MB)
  • disable dictionary, since it's only effective for small files (< several KB), while its impact is negligible for large files
  • route messages to use either dictionary (when they are small) or multi-threading (when they are large), but not both at the same time.

@terrelln
Copy link
Contributor

git bisect with this script says that it is fixed by commit fc8d293 from PR #891.

@Cyan4973 Cyan4973 added the bug label Dec 11, 2017
@mestia
Copy link
Author

mestia commented Dec 12, 2017

Thank you for looking on it and for the test case. As far as I see the fix from commit fc8d293 doesn't solve the issue. I still get "Corrupted block detected" messages with the test case provided by @gsliepen.

@terrelln
Copy link
Contributor

Thanks for testing @mestia! We believe we figured out the issue, but hadn't posted here yet. Commit fc8d293 merely hides the issue when compressing a single file, but it remains when compressing multiple files.

When compressing with multiple threads and with a dictionary, we can confuse the window size, and use a larger window size for the second chunk than the first. Since the window size of the first chunk is what is written in the frame, the second chunk can have too-large offsets.

The dev branch has hidden this issue, but the root cause isn't solved yet. @Cyan4973 is looking into it.

If you have corrupted data that you need to recover you can interpret the frame header using the zstd format, change the window size to be larger, then you should be able to decompress the data successfully. You can verify you got the header right with zstd -lv file.zst. The compressed frame contains a checksum by default so you can be confident it is correct (you can check with zstd -l file.zst).

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 12, 2017

We are considering creating a new release of zstd later in the week, once the deeper root cause is fully investigated and fixed. This bug requires several conditions to be assembled in order to show up, and indeed, the current dev branch masks the issue by making it impossible to generate all these conditions together from the cli. But it doesn't mean the root cause is properly tackled.
More to come after investigation.

Cyan4973 added a commit that referenced this issue Dec 12, 2017
Cyan4973 added a commit that referenced this issue Dec 12, 2017
Cyan4973 added a commit that referenced this issue Dec 13, 2017
this test fails on v1.3.2
@Cyan4973 Cyan4973 mentioned this issue Dec 13, 2017
Cyan4973 added a commit that referenced this issue Dec 14, 2017
@gsliepen
Copy link

gsliepen commented Dec 14, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants