Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PkgServer] use Tar.jl to extract tarballs #34

Closed
wants to merge 2 commits into from
Closed

[PkgServer] use Tar.jl to extract tarballs #34

wants to merge 2 commits into from

Conversation

johnnychen94
Copy link
Contributor

cont. #29

src/resource.jl Outdated
@@ -21,6 +21,9 @@ const resource_re = Regex("""
""", "x")
const hash_part_re = Regex("/($hash_re)\$")

compress(io::IO) = TranscodingStream(GzipCompressor(level=9), io)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused.

src/resource.jl Outdated
@@ -21,6 +21,9 @@ const resource_re = Regex("""
""", "x")
const hash_part_re = Regex("/($hash_re)\$")

compress(io::IO) = TranscodingStream(GzipCompressor(level=9), io)
decompress(io::IO) = TranscodingStream(GzipDecompressor(), io)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to involve TranscodingStreams. GzipDecompressorStream can be used directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of how that would be done? I've only used it via TranscodingStreams so far...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/GunnarFarneback/LocalPackageServer.jl/blob/734490f1db4e0e1597fc262566c897931781b1c0/src/resource.jl#L115.

Without checking I'm fairly certain it's exactly the same under the hood, you can just avoid a direct dependency on TranscodingStreams.

@johnnychen94
Copy link
Contributor Author

Similar to https://github.com/JuliaPackaging/PkgServer.jl/pull/29/files#r417042956, there seems also a significant performance issue here.

function extract_tar_julia(tarball, outdir)
    open(tarball) do io
        Tar.extract(decompress(io), outdir)
    end
end

function extract_tar(tarball, outdir)
    mkpath(outdir)
    run(`tar -C $outdir -zxf $tarball`)
end

# tarball is a file of size 3M
@time extract_tar_julia(tarball, "tmp") # 0.06s
@time extract_tar(tarball, "tmp") # 0.03s

If the performance regression is accepted during building a storage server, that during serving contents via pkg server would typically harm user experiences. Perhaps this PR is not wanted at present.

@StefanKarpinski
Copy link
Collaborator

This only happens on the first hit for a given tarball, so I'm not too worried about speed—the first person to get this is going to have a slow experience anyway. All the ones after that will be fast. We can, in principle, verify tarballs without unpacking them (they do need to be uncompressed), but that would need to be implemented.

@GunnarFarneback
Copy link
Contributor

but that would need to be implemented.

JuliaIO/Tar.jl#35

@staticfloat
Copy link
Member

As a part of 3aa8ea3 I actually ended up using the within-tarball tree hash calculation that Stefan implemented in Tar.jl, so that tarballs never need to be extracted on-disk. We also start calculating the tree hash while the tarball is still being downloaded, so this gives a small but important latency decrease.

Thanks for the work here @johnnychen94!

@johnnychen94 johnnychen94 deleted the jc/tar branch June 12, 2020 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants