Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Zstd_jll to provide zstd, update to 1.4.4 #20

Merged
merged 2 commits into from
Mar 16, 2020

Conversation

ararslan
Copy link
Member

@ararslan ararslan commented Jan 31, 2020

This rips out the BinaryProvider machinery in favor of the JLL produced automatically by Yggdrasil. See this blog post for more information about JLLs. In doing this, it updates the version of zstd used to 1.4.4, as that's the latest version provided by Yggdrasil.

It seems that #19 and perhaps #17 as well were caused by an upstream performance regression. I haven't identified the exact upstream commit, but there was at least one known regression in zstd 1.4.2 that has since been fixed.

For a 3.4 MB zstd-compressed file, these are the timings I get. Setup code:

julia> using CodecZstd, TranscodingStreams

julia> bytes = read("file.zst");

julia> @time transcode(ZstdDecompress, bytes);

Output on CodecZstd v0.6.0:

  0.217029 seconds (594.58 k allocations: 34.930 MiB, 2.86% gc time)

CodecZstd v0.6.1:

  0.291829 seconds (594.58 k allocations: 34.930 MiB, 2.06% gc time)

This PR:

  0.218454 seconds (594.54 k allocations: 34.918 MiB)

Fixes #17, fixes #19 (would be good to have @jrevels and @Ankur-deDev confirm)

@ararslan
Copy link
Member Author

Oh right, I forgot that JLLs only support Julia 1.3 and later... @bicycle1885, what are your thoughts on setting the minimum Julia version here to 1.3?

This is required for using JLLs.
@ararslan
Copy link
Member Author

I pushed a separate commit that bumps the package version and the minimum Julia version, since that's effectively required for this PR.

@jrevels
Copy link

jrevels commented Jan 31, 2020

Can confirm this substantially helps for #19! On this branch, I get:

julia> @time transcode(ZstdDecompressor, bytes);
  3.925222 seconds (13 allocations: 1.830 GiB, 0.03% gc time)

But still seems to be slower than v0.6.0 by a tad (not horrible, but non-negligible):

julia> @time transcode(ZstdDecompressor, bytes);
  3.496933 seconds (13 allocations: 1.830 GiB, 0.60% gc time)

I guess this remaining chunk is probably another upstream perf issue?

@ararslan
Copy link
Member Author

I guess this remaining chunk is probably another upstream perf issue?

Absent testing directly with the C library, that would be my guess, since there haven't been any changes to the code here.

@Ankur-deDev
Copy link

Ankur-deDev commented Feb 3, 2020

Thanks for taking time to fix this! I have numbers similar to ararslan.

CodecZstd v0.6.0:
2.716708 seconds (17.69 M allocations: 5.577 GiB, 35.01% gc time)

CodecZstd v0.6.1:
3.957340 seconds (17.69 M allocations: 5.577 GiB, 25.83% gc time)

PR#20:
2.755108 seconds (17.69 M allocations: 5.577 GiB, 35.80% gc time)

@bicycle1885
Copy link
Member

Thank you very much for doing it.

Oh right, I forgot that JLLs only support Julia 1.3 and later... @bicycle1885, what are your thoughts on setting the minimum Julia version here to 1.3?

I'm perfectly fine with it. People should always use the latest release of Julia 😄

I quickly benchmarked the performance using 80 MB compressed text. On my laptop (macOS), this pull request was significantly faster than v0.6.0 and v0.6.1. However, on a Linux machine, this was somewhat slower than v0.6.0 and v0.6.1. Strangely, I couldn't detect reported performance degradation on v0.6.1 (I ran Pkg.resolve and Pkg.build before each benchmarking, so I believe I used the correct target binaries). I'll take more time on diagnosis.

macOS

v0.6.0:

  minimum time:     797.550 ms (0.03% GC)
  median time:      817.193 ms (3.74% GC)
  mean time:        822.142 ms (4.90% GC)
  maximum time:     866.895 ms (9.58% GC)

v0.6.1 (master):

  minimum time:     757.091 ms (0.03% GC)
  median time:      807.879 ms (3.43% GC)
  mean time:        811.865 ms (4.83% GC)
  maximum time:     870.128 ms (9.86% GC)

PR#20:

  minimum time:     692.454 ms (3.96% GC)
  median time:      717.373 ms (3.86% GC)
  mean time:        723.640 ms (5.46% GC)
  maximum time:     774.395 ms (10.65% GC)

Linux (x86-64)

v0.6.0:

  minimum time:     494.814 ms (0.36% GC)
  median time:      501.083 ms (0.36% GC)
  mean time:        516.164 ms (3.79% GC)
  maximum time:     578.022 ms (14.15% GC)

v0.6.1 (master):

  minimum time:     499.878 ms (0.07% GC)
  median time:      502.071 ms (0.37% GC)
  mean time:        517.356 ms (3.38% GC)
  maximum time:     581.141 ms (13.92% GC)

PR#20:

  minimum time:     554.353 ms (0.05% GC)
  median time:      558.730 ms (0.32% GC)
  mean time:        571.599 ms (2.74% GC)
  maximum time:     630.729 ms (11.83% GC)

@ararslan
Copy link
Member Author

Interesting. The timings I reported in the PR body, where the performance was 0.6.1 < this PR < 0.6.0 were from x64 Linux as well.

@jrevels
Copy link

jrevels commented Mar 16, 2020

Would love if this were merged/tagged soon :) seems like a straight win to at least upgrade to the new Artifact system for the underlying zstd binary (even if zstd itself seems to have some perf fluctuations)

@ararslan ararslan requested a review from bicycle1885 March 16, 2020 16:17
@ararslan
Copy link
Member Author

As Jarrett said, I think this PR provides a strict improvement in terms of quality of life for recent Julia versions. Given that there seem to be performance fluctuations in zstd itself and this PR is otherwise a non-functional change, I think we should just go ahead and merge it.

@ararslan ararslan requested a review from jrevels March 16, 2020 20:34
@ararslan ararslan merged commit 39470d1 into JuliaIO:master Mar 16, 2020
@ararslan ararslan deleted the aa/jll branch March 16, 2020 23:14
ararslan added a commit to beacon-biosignals/Onda.jl that referenced this pull request Mar 16, 2020
CodecZstd 0.7.0 requires Julia 1.3 and accesses zstd via a JLL. See
JuliaIO/CodecZstd.jl#20 for more information.

In the interest of releasing a version of Onda that supports the new
version of CodecZstd, this also bumps the patch version.
jrevels pushed a commit to beacon-biosignals/Onda.jl that referenced this pull request Mar 17, 2020
CodecZstd 0.7.0 requires Julia 1.3 and accesses zstd via a JLL. See
JuliaIO/CodecZstd.jl#20 for more information.

In the interest of releasing a version of Onda that supports the new
version of CodecZstd, this also bumps the patch version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slower than expected decompression, round 2 Performance CodecZstd vs CodecZlib
4 participants