-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ZSTD_findDecompressedSize
with ZSTD_getFrameContentSize
#62
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #62 +/- ##
==========================================
- Coverage 36.96% 36.73% -0.23%
==========================================
Files 5 5
Lines 560 558 -2
==========================================
- Hits 207 205 -2
Misses 353 353 ☔ View full report in Codecov by Sentry. |
What they mean by "static linking" is when the the Zstd library being called is not being tightly version controlled. However, in our case, we actually do have fairly tight control since we are also packaging the library via JLLs and then linking via JIT compilation. The main issue is when someone tries to override the shared libraries via the JLLs or someone tries to link to some old system or conda binaries. |
If you want to replace this function, implement the following loop in Julia: https://github.com/facebook/zstd/blob/dev/lib%2Fdecompress%2Fzstd_decompress.c#L636-L679 This involves calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a correct implementation of the former functionality. You are not accounting for the possibility of multiple frames.
ret = find_decompressed_size(input.ptr, input.size) | ||
ret = LibZstd.ZSTD_getFrameContentSize(input.ptr, input.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a proper replacement. getFrameContentSize
only reads the size of a single frame.
Note 5: ZSTD_findDecompressedSize handles multiple frames, and so it must traverse the input to read each contained frame header. This is fast as most of the data is skipped, however it does mean that all frame data must be present and valid.
I don't think that is needed for the
Are there any realistic benchmarks where this PR slows things down? |
I removed the static API in 813632f for #63. We should check the registered dependencies to see if any of them are using those functions somehow. We may want to expose them in a separate package in someone needs them. #63 also implements an equivalent to I'm not sure about slowness, but I do not see the need to do multiple allocations if we can get a better estimate of the decompressed size upfront by looping through frames. |
I can't find any other package using I also did some benchmarks of this PR and #63, and I didn't find any performance differences. |
Since #63 is just as fast but avoids some potentially large allocations it seems like a better option. |
Fixes #50
@mkitti Do you know how to remove all the
ZSTDLIB_STATIC_API
functions fromLibZstd_clang.jl
? IIUC Julia is not statically linking zstd, so these functions should not be used if we want to avoid future releases of zstd breaking things.