Replace `ZSTD_findDecompressedSize` with `ZSTD_getFrameContentSize` #62

nhz2 · 2024-09-03T23:46:42Z

Fixes #50

@mkitti Do you know how to remove all the ZSTDLIB_STATIC_API functions from LibZstd_clang.jl? IIUC Julia is not statically linking zstd, so these functions should not be used if we want to avoid future releases of zstd breaking things.

codecov · 2024-09-03T23:48:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 36.73%. Comparing base (31cf742) to head (02a18de).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #62      +/-   ##
==========================================
- Coverage   36.96%   36.73%   -0.23%     
==========================================
  Files           5        5              
  Lines         560      558       -2     
==========================================
- Hits          207      205       -2     
  Misses        353      353

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mkitti · 2024-09-04T00:02:36Z

What they mean by "static linking" is when the the Zstd library being called is not being tightly version controlled. However, in our case, we actually do have fairly tight control since we are also packaging the library via JLLs and then linking via JIT compilation.

The main issue is when someone tries to override the shared libraries via the JLLs or someone tries to link to some old system or conda binaries.

mkitti · 2024-09-04T00:11:36Z

If you want to replace this function, implement the following loop in Julia:

https://github.com/facebook/zstd/blob/dev/lib%2Fdecompress%2Fzstd_decompress.c#L636-L679

This involves calling getFrameContentSize and then advancing by findCompressedFrameSize until there are no more frames.

mkitti

This is not a correct implementation of the former functionality. You are not accounting for the possibility of multiple frames.

mkitti · 2024-09-04T00:06:30Z

src/decompression.jl

-    ret = find_decompressed_size(input.ptr, input.size)
+    ret = LibZstd.ZSTD_getFrameContentSize(input.ptr, input.size)


This is not a proper replacement. getFrameContentSize only reads the size of a single frame.

Note 5: ZSTD_findDecompressedSize handles multiple frames, and so it must traverse the input to read each contained frame header. This is fast as most of the data is skipped, however it does mean that all frame data must be present and valid.

nhz2 · 2024-09-04T00:22:56Z

I don't think that is needed for the expectedsize function.

expectedsize is just an estimate of the encoded size for initially allocating output space. More output space will get allocated automatically when the next frames get read.

Are there any realistic benchmarks where this PR slows things down?

mkitti · 2024-09-04T12:47:36Z

I removed the static API in 813632f for #63.

We should check the registered dependencies to see if any of them are using those functions somehow. We may want to expose them in a separate package in someone needs them.

#63 also implements an equivalent to ZSTD_findDecompressedSize in Julia using the public API.

I'm not sure about slowness, but I do not see the need to do multiple allocations if we can get a better estimate of the decompressed size upfront by looping through frames.

nhz2 · 2024-09-06T00:28:11Z

I can't find any other package using find_decompressed_size on JuliaHub.

I also did some benchmarks of this PR and #63, and I didn't find any performance differences.

nhz2 · 2024-09-06T00:30:28Z

Since #63 is just as fast but avoids some potentially large allocations it seems like a better option.

replace ZSTD_findDecompressedSize with ZSTD_getFrameContentSize

02a18de

nhz2 requested a review from mkitti September 3, 2024 23:53

mkitti requested changes Sep 4, 2024

View reviewed changes

mkitti mentioned this pull request Sep 4, 2024

Reimplement find_decompressed_size without using static-only API #63

Merged

nhz2 closed this Sep 7, 2024

nhz2 deleted the nz/avoid-experimental-zstd-API branch September 7, 2024 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `ZSTD_findDecompressedSize` with `ZSTD_getFrameContentSize` #62

Replace `ZSTD_findDecompressedSize` with `ZSTD_getFrameContentSize` #62

nhz2 commented Sep 3, 2024

codecov bot commented Sep 3, 2024 •

edited

Loading

mkitti commented Sep 4, 2024

mkitti commented Sep 4, 2024

mkitti left a comment

mkitti Sep 4, 2024

nhz2 commented Sep 4, 2024

mkitti commented Sep 4, 2024

nhz2 commented Sep 6, 2024

nhz2 commented Sep 6, 2024

		ret = find_decompressed_size(input.ptr, input.size)
		ret = LibZstd.ZSTD_getFrameContentSize(input.ptr, input.size)

Replace ZSTD_findDecompressedSize with ZSTD_getFrameContentSize #62

Replace ZSTD_findDecompressedSize with ZSTD_getFrameContentSize #62

Conversation

nhz2 commented Sep 3, 2024

codecov bot commented Sep 3, 2024 • edited Loading

Codecov Report

mkitti commented Sep 4, 2024

mkitti commented Sep 4, 2024

mkitti left a comment

Choose a reason for hiding this comment

mkitti Sep 4, 2024

Choose a reason for hiding this comment

nhz2 commented Sep 4, 2024

mkitti commented Sep 4, 2024

nhz2 commented Sep 6, 2024

nhz2 commented Sep 6, 2024

Replace `ZSTD_findDecompressedSize` with `ZSTD_getFrameContentSize` #62

Replace `ZSTD_findDecompressedSize` with `ZSTD_getFrameContentSize` #62

codecov bot commented Sep 3, 2024 •

edited

Loading