Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ZSTD_findDecompressedSize with ZSTD_getFrameContentSize #62

Closed
wants to merge 1 commit into from

Conversation

nhz2
Copy link
Member

@nhz2 nhz2 commented Sep 3, 2024

Fixes #50

@mkitti Do you know how to remove all the ZSTDLIB_STATIC_API functions from LibZstd_clang.jl? IIUC Julia is not statically linking zstd, so these functions should not be used if we want to avoid future releases of zstd breaking things.

Copy link

codecov bot commented Sep 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 36.73%. Comparing base (31cf742) to head (02a18de).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #62      +/-   ##
==========================================
- Coverage   36.96%   36.73%   -0.23%     
==========================================
  Files           5        5              
  Lines         560      558       -2     
==========================================
- Hits          207      205       -2     
  Misses        353      353              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nhz2 nhz2 requested a review from mkitti September 3, 2024 23:53
@mkitti
Copy link
Member

mkitti commented Sep 4, 2024

What they mean by "static linking" is when the the Zstd library being called is not being tightly version controlled. However, in our case, we actually do have fairly tight control since we are also packaging the library via JLLs and then linking via JIT compilation.

The main issue is when someone tries to override the shared libraries via the JLLs or someone tries to link to some old system or conda binaries.

@mkitti
Copy link
Member

mkitti commented Sep 4, 2024

If you want to replace this function, implement the following loop in Julia:

https://github.com/facebook/zstd/blob/dev/lib%2Fdecompress%2Fzstd_decompress.c#L636-L679

This involves calling getFrameContentSize and then advancing by findCompressedFrameSize until there are no more frames.

Copy link
Member

@mkitti mkitti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a correct implementation of the former functionality. You are not accounting for the possibility of multiple frames.

ret = find_decompressed_size(input.ptr, input.size)
ret = LibZstd.ZSTD_getFrameContentSize(input.ptr, input.size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a proper replacement. getFrameContentSize only reads the size of a single frame.

Note 5: ZSTD_findDecompressedSize handles multiple frames, and so it must traverse the input to read each contained frame header. This is fast as most of the data is skipped, however it does mean that all frame data must be present and valid.

@nhz2
Copy link
Member Author

nhz2 commented Sep 4, 2024

I don't think that is needed for the expectedsize function.

expectedsize is just an estimate of the encoded size for initially allocating output space. More output space will get allocated automatically when the next frames get read.

Are there any realistic benchmarks where this PR slows things down?

@mkitti
Copy link
Member

mkitti commented Sep 4, 2024

I removed the static API in 813632f for #63.

We should check the registered dependencies to see if any of them are using those functions somehow. We may want to expose them in a separate package in someone needs them.

#63 also implements an equivalent to ZSTD_findDecompressedSize in Julia using the public API.

I'm not sure about slowness, but I do not see the need to do multiple allocations if we can get a better estimate of the decompressed size upfront by looping through frames.

@nhz2
Copy link
Member Author

nhz2 commented Sep 6, 2024

I can't find any other package using find_decompressed_size on JuliaHub.

I also did some benchmarks of this PR and #63, and I didn't find any performance differences.

@nhz2
Copy link
Member Author

nhz2 commented Sep 6, 2024

Since #63 is just as fast but avoids some potentially large allocations it seems like a better option.

@nhz2 nhz2 closed this Sep 7, 2024
@nhz2 nhz2 deleted the nz/avoid-experimental-zstd-API branch September 7, 2024 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ZSTD_findDecompressedSize is ZSTDLIB_STATIC_API
2 participants