Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zstdmt stability issues and clean up the zstdmt code #2339

Merged
merged 7 commits into from
Oct 28, 2020

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Oct 2, 2020

  • Remove singleBlockingThread mode because it is already handled in zstd_compress.c.
  • Remove the single-pass shortcut fixing Issue Zstd multithreaded output can depend on number of threads #2327.
  • Rip out all unused functions from the zstdmt API. Delete the zstreamtest --mt test. Rewrite unit tests to use the zstd API.
  • Rewrite the zstreamtest fuzzer loop to check for compression determinism. It had to be rewritten to always make maximal forward progress to ensure determinism in multithreading mode.

Bug fixes (the last 2 commits: see commit messages):

  • Zstdmt would create empty compression jobs when called with ZSTD_e_end and when it can't get an input buffer. It would also incorrectly set mtctx->frameEnded = 1, which could cause other issues, like an unterminated frame. The fix is to move the switch to ZSTD_e_flush outside of the if() block.
  • Zstdmt with rsyncable enabled could be non-deterministic because it would sometimes "skip" a synchronization point when the job table was full.

@terrelln
Copy link
Contributor Author

terrelln commented Oct 2, 2020

After ~400K iterations I saw some non-determinism. But it isn't reproducible. I have to figure out if it is a problem with the test or zstdmt.

@terrelln
Copy link
Contributor Author

terrelln commented Oct 6, 2020

I believe I've found all the sources of non-determinism. Running 3 x 400K tests now.

@terrelln
Copy link
Contributor Author

terrelln commented Oct 6, 2020

There is some non-determinism left. It is pretty rare. I would be okay merging the test as-is, though it may occasionally fail.

I'm running the tests with tracing enabled now. Hopefully it will eventually fail and I can get a trace of what is happening to narrow it down and fix it.

* This is a blocking function : it will only give back control to caller after finishing its compression job.
*/
static size_t
ZSTDMT_compress_advanced_internal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early in the life of zstdmt, one of the benefits of this function
is that it was able to compress directly from src into dst
provided there is enough dstCapacity (<= ZSTD_compressBound(srcSize)),
thus skipping the need to use intermediate buffers,
thus saving memory, and some unnecessary memcpy().

With this function removed,
is the replacement able to retain this property ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the replacement able to retain this property ?

No. We already weren't saving memory because when using the streaming API we have already allocated all of our buffers at this point. With this change, we are doing one extra memcpy for the source buffers.

if ((input->pos < input->size) && (endOp == ZSTD_e_end)) {
/* Can't end yet because the input is not fully consumed.
* We are in one of these cases:
* - Input buffer is NULL & empty
Copy link
Contributor

@Cyan4973 Cyan4973 Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note : if input buffer is NULL,
we should also have :
input->pos == 0 == input->size,
therefore, this scenario shouldn't pass the if() test (strict < size comparison).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should clarify the comment. I mean the mtctx->inBuff.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 9, 2020

This is a pretty good simplification.

I mostly have a question regarding the memory usage of the replacement function for one-shot compression.
Maybe a good case to employ stableSrc & stableDst directives,
though I'm not sure if they are actually available for zstdmt.

@terrelln
Copy link
Contributor Author

I mostly have a question regarding the memory usage of the replacement function for one-shot compression.
Maybe a good case to employ stableSrc & stableDst directives,
though I'm not sure if they are actually available for zstdmt.

The memory usage isn't impacted, since there isn't a way to avoid allocating all the buffers using the zstd API.

I think there is opportunity for stableSrc and stableDst directives, especially with ZSTD_compress2(). Both for zstdmt and single-thread zstd. With stableDst we could, for example, always take the single-threaded shortcut, even when the output is less than ZSTD_compressBound(), since we know the user will never provide more output size. With zstdmt, we could lazily allocate, and save memory when not needed, since we can't handle static allocation anyways.

I think that should be a follow up PR though, since this doesn't regress the current state of the world (except for 1 more memcpy() of the source).

@terrelln
Copy link
Contributor Author

I've opened issue #2353 to handle the memory saving optimizations.

This is already handled by zstd, so this logic is never used.
Simplifies the code and removes blocking from zstdmt.

At this point we could completely delete
`ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because
I think we want to do that in the zstd-1.5.0 release, in case anyone is
still using the ZSTDMT API, even though it is not installed by default.

Fixes facebook#2327.
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
* Run compression twice and check the compressed data is byte-identical.
  The compression loop had to be rewritten to ensure deteriminism. It is
  guaranteed by always making maximal forward progress.
* When nbWorkers > 0, change the number of workers 1/8 of the time.
* Run in single-pass mode 1/4 of the time.

I've run a few hundred thousand iterations of zstreamtest and have seen
no deteriminism issues so far. Before the zstdmt fix that skips the
single-pass shortcut non-determinism showed up in a few hundred
iterations.
When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.

The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
The problem occurs in this scenario:
1. We find a synchronization point.
2. We attmept to create the job.
3. We fail because the job table is full: `mtctx->nextJobID > mtctx->doneJobID + mtctx->jobIDMask`.
4. We call `ZSTDMT_compressStream_generic` again.
5. We forget that we're at a sync point already, and we continue looking
   for the next sync point.

This fix is to detect if we're currently paused at a sync point, and if
we are then don't load any more input.

Caught by zstreamtest. I modified it to make the bug occur more often
(~1/100K -> ~1/200) and verified that it is fixed after. I then ran a
few hundred thousand unmodified zstreamtest iterations to verify.
@terrelln terrelln merged commit 599ff58 into facebook:dev Oct 28, 2020
terrelln added a commit to terrelln/zstd that referenced this pull request Oct 30, 2020
facebook#2339 removes the single-pass zstdmt API.
This changes the compressed size, because we no longer take the # of threads into
account when deciding the job size.
@ghost ghost mentioned this pull request Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants