-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive: use pigz|zstd if available #1964
archive: use pigz|zstd if available #1964
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the reported speedup on faith, sure, why not.
Tests are unhappy, I didn’t check the details of the failure.
pkg/archive/filter.go
Outdated
go func() { | ||
defer w.Close() | ||
f.runErr = cmd.Run() | ||
pool.Put(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think this function should be in the business of managing pool
; let the caller worry about that, probably using NewReadCloserWrapper
like on all the other paths.
(Uh… is the existing code in the caller safe? Shouldn’t it trigger on close of the input to the decompression, rather than the output? It’s not obvious to me that they always happen in the expected order.
If it is not safe, that’s also a reason for the caller to worry about that … and at least a separate commit.)
pkg/archive/filter.go
Outdated
f := &filterReader{reader: r} | ||
go func() { | ||
defer w.Close() | ||
f.runErr = cmd.Run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is a data race vs the consumer goroutine
- I don’t immediately see that this
filterReader
is necessary: Isn’t it enough to do
var err = errors.New("internal error: should never be seen")
defer w.CloseWithError(err) // CloseWithErr(nil) == Close()
err = cmd.Run()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
under what conditions will err
ever be errors.New("internal error: should never be seen")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panic before Run
returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… and in this case that should be an uncaught panic and terminate the whole process, so it doesn’t matter very much what err
is (I guess? I’m not sure whether processing a panic is concurrent with the pipe consumer goroutine running).
It might actually matter if there were a recover
somewhere on the call stack.
For me, this is now just a pattern, to try very hard to ensure that a pipe is always closed, so that the consumer will never be left hanging without a visible reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… and it’s a belt-and-suspenders thing: this makes sure to avoid a hypothetical CloseWithError(nil)
on the panic
path — whether or not panic
is relevant, the point is that the claim “the pipe is successfully closed only on success” can be easily verified.
One side effect of using |
Thanks for the information, I didn't think of that problem and something to consider if we plan to use pigz also for the compressor. The current PR changes only the decompression side |
2fc8de5
to
78cf545
Compare
78cf545
to
ebc2123
Compare
I think the pull side is much more important then the push side. You usually push once and pull many times. Stick to only the uncompress. |
ebc2123
to
1fbb9e2
Compare
ready for review, CI is green |
1fbb9e2
to
6a8d126
Compare
LGTM |
FWIW we never promised the compressed blobs to be reproducible (and we can’t, compression implementations make no such promises — and e.g. https://github.com/klauspost/compress is changing too often for me to think that the output never changes). And yes, from time to time we see users who say “it’s been working for us fine for a year”, and it’s a struggle to convince them them that is it might very well break the next week and they need to change their design. But it does break from time to time, and yes they do need to change their design; and preventing any single output change would not help avoid that need to change the design. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK to implementation.
6a8d126
to
c283dd9
Compare
I've added a new patch to the PR. I've realized the existing implementation was leaking the buffer on errors |
Signed-off-by: Giuseppe Scrivano <[email protected]>
use the pigz command line tool when available as it is much faster to decompress a gzip stream. On my machine I've seen a 50% pull time reduction when pulling some big images. Signed-off-by: Giuseppe Scrivano <[email protected]>
the performance improvement is not as clear as with pigz, but it is still measurable difference. Signed-off-by: Giuseppe Scrivano <[email protected]>
c283dd9
to
ae8836f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
(Purely personal aesthetic preference: differentiate the err
and Err
variables more. Also, named return values are a bit dangerous, so emphasizing that it is a return value might help. But that’s fully up to the maintainers of c/storage.)
@edsantiago, did you see performance improvements after vendoring this change into Podman CI? Nice work, @giuseppe ! |
@vrothberg I'm not ignoring your question. (1) this is not yet vendored into podman, and (2) our PR merge rate is so low these days that it's hard to get performance numbers. I am keeping this on my TODO list and will report back once there's something to learn. |
use the command line tools when available as they are much faster than the Go libraries.
Especially for gzip, I've registered a 50% improvement
use zstd if available
the performance improvement is not as clear as with pigz, but it is
still measurable difference.