-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse entries from a Readable stream #97
base: master
Are you sure you want to change the base?
Conversation
This replaces multiple (2 x entry count) small reads with a single read stream containing the full central directory and eliminates many intermediate Buffer allocations and copies in the process, which were resulting in increased time spent reading entries. This is a port of the old entry parsing code to a Transform stream (as an ES6 class; available since Node 6.0.0). The central directory size is read from the ZIP64 end of central directory record (if available) and used to bound the size of the requested Readable stream. lazyEntries: true is implemented by pausing the stream and reading individual Entry objects as requested.
Thanks for the PR! Did you get a chance to benchmark this in your AWS environment? Regarding ES6, Node 6, and "All checks have failed", I'm not really sure what to do about this. Is it time to drop support for Node 0 yet? I don't use Node 0. 🤷♂️ |
Yep. Things look great on Lambda; effectively the same as #95 but without the downsides of unbounded buffer sizes. Forcing |
f0ddcbf
to
985a3f5
Compare
I've made some changes to get this green on 0.10. |
@thejoshwolfe nudge. |
thanks again for the contribution. And sorry, but it's going to take me some time to get around to this. |
No worries. Much appreciated! If there’s anything I can do, let me know! |
@thejoshwolfe any ETA/prognosis for this? Anything others can do to help? |
Hey this would be great to add. @mojodna have you been using this fork in production and does it speed things up for archives with many files? |
Not production per se, but I have tested a service that uses this patch quite heavily. It works great! |
This replaces multiple (2 x entry count) small reads with a single read stream containing the full central directory and eliminates many intermediate Buffer allocations and copies in the process, which were resulting in increased time spent reading entries. Readable stream implementations are responsible for determining appropriate buffer sizes.
This is a port of the old entry parsing code to a Transform stream (as an ES6 class; available since Node 6.0.0--I can change this if desired). The central directory size is read from the ZIP64 end of central directory record (if available, otherwise it's calculated from the size of the file) and used to bound the size of the requested Readable stream. (This was necessary to prevent the ZIP64 eocdr from being parsed as an entry with an invalid signature.)
lazyEntries: true
is implemented by pausing the stream and reading individual Entry objects as requested, recursively viasetImmediate
. This is a little weird (I'd tried resuming and pausing again using.once("data")
, but that didn't work), but seems to be the only way to ensure that only a single entry is emitted at a time.Fixes #96