This repository has been archived by the owner on Jun 2, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
This patch buffers up the reverse search that happens in
spec::CentralDirectoryEnd::find_and_parse()
. This speeds the code up tremendously when the file isn't actually a zip file and it's bigger than 64K.In my app I was serving some static web pages out of a zip file and falling back to serving them from a filesystem path. In my case the zip file was a large file that may or may not actually be a zip file (if it is then the files come out of it, otherwise they come from somewhere else).
On a linux machine I was seeing 400ms or so of end-to-end latency between when I would request the file and when it would get served. This is in the case where the large file is not a zip file.
With this change the end-to-end latency is about 1ms.
The patch also adds some unit tests to
src/spec.rs
which helped me validate that it was correct. I kept them as a separate commit since they aren't specific to the new code.I noticed the zip64 code did something similar. I'm not using zip64 but I thought I'd take a crack at it. I'm not as sure about those changes (in the sense that I don't know if they are actually necessary and if it really speeds anything up). It seems like the slow case would exist if there are 1000s of bytes of garbage before the cde64. I don't really understand the case where there is any junk there so I don't know how realistic it is to optimize for some hypothesized worst case. I didn't benchmark this change but 30 seek()+read() syscalls in a row (which is the case for the
tests/data/zip64_demo.zip
) would be about 200µs (assuming the same overhead as theCentralDirectoryEnd
code), which admittedly isn't much.At any rate, I'm pretty sure the code is correct—I added unit tests for this change as well. These tests are pretty specific to buffer boundaries in the new code so I kept them together with this commit.
In summary, I'm a little more interested in the first 2 patches getting through. I believe the 3rd one (zip64) is technically correct but I can't vouch for it as much as the others.
Thanks for the great Rust library, btw!