Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature idea: support expansion to stdout #27

Open
leahneukirchen opened this issue Apr 8, 2023 · 4 comments
Open

Feature idea: support expansion to stdout #27

leahneukirchen opened this issue Apr 8, 2023 · 4 comments

Comments

@leahneukirchen
Copy link

For streaming dataflow pipelines, supporting expansion to stdout could be useful.

@adetaylor
Copy link
Collaborator

Thanks for the suggestion.

I want to keep ripunzip very focused on the specific job it was designed for, which is unzipping zip files containing lots of contents in parallel.

I would be open to accepting a pull request to add this feature, but with a pretty high bar. I'd need direct numeric evidence that this is a substantial time saving for some folks over using standard unzipping tools. The only possible time saving here is from doing the fetch & unzip in parallel, so the best theoretical speedup is 2x. That's a lot less than the 30x+ speedup which is possible when unzipping lots of files from the same zip file. But, 2x could still be useful for some folks - I'd want to hear from them and to be convinced it was worth the slight deviation from ripunzip's original mission.

@leahneukirchen
Copy link
Author

I had the impression that mere decompression of one stream is multi-threaded too, so can use multiple cores (ala pigz or pixz).

If that's not the case, it's probably not worth it.

@adetaylor
Copy link
Collaborator

FWIW I partially withdraw my earlier comment - now that we fetch from URIs, sometimes ripunzip is useful when unzipping just a single file, so a stdout option would be useful.

@evils
Copy link

evils commented Feb 13, 2024

a to me obvious advantage would be the ability to start using the output in a pipe while the download is still happening without having to save a massive decompressed output
(though i am unsure if this would work well with tar's --sort=name for my particular case (deterministic repacking))

i was quite surprised ripzip doesn't support this, not even with --output-directory /dev/stdout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants