-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature idea: support expansion to stdout #27
Comments
Thanks for the suggestion. I want to keep I would be open to accepting a pull request to add this feature, but with a pretty high bar. I'd need direct numeric evidence that this is a substantial time saving for some folks over using standard unzipping tools. The only possible time saving here is from doing the fetch & unzip in parallel, so the best theoretical speedup is 2x. That's a lot less than the 30x+ speedup which is possible when unzipping lots of files from the same zip file. But, 2x could still be useful for some folks - I'd want to hear from them and to be convinced it was worth the slight deviation from |
I had the impression that mere decompression of one stream is multi-threaded too, so can use multiple cores (ala pigz or pixz). If that's not the case, it's probably not worth it. |
FWIW I partially withdraw my earlier comment - now that we fetch from URIs, sometimes ripunzip is useful when unzipping just a single file, so a stdout option would be useful. |
a to me obvious advantage would be the ability to start using the output in a pipe while the download is still happening without having to save a massive decompressed output i was quite surprised |
For streaming dataflow pipelines, supporting expansion to stdout could be useful.
The text was updated successfully, but these errors were encountered: