Skip to content
This repository has been archived by the owner on Jun 2, 2024. It is now read-only.

Faster cde rejection #412

Closed
wants to merge 3 commits into from
Closed

Conversation

caldwell
Copy link

Hi,

This patch buffers up the reverse search that happens in spec::CentralDirectoryEnd::find_and_parse(). This speeds the code up tremendously when the file isn't actually a zip file and it's bigger than 64K.

In my app I was serving some static web pages out of a zip file and falling back to serving them from a filesystem path. In my case the zip file was a large file that may or may not actually be a zip file (if it is then the files come out of it, otherwise they come from somewhere else).

On a linux machine I was seeing 400ms or so of end-to-end latency between when I would request the file and when it would get served. This is in the case where the large file is not a zip file.

With this change the end-to-end latency is about 1ms.

The patch also adds some unit tests to src/spec.rs which helped me validate that it was correct. I kept them as a separate commit since they aren't specific to the new code.


I noticed the zip64 code did something similar. I'm not using zip64 but I thought I'd take a crack at it. I'm not as sure about those changes (in the sense that I don't know if they are actually necessary and if it really speeds anything up). It seems like the slow case would exist if there are 1000s of bytes of garbage before the cde64. I don't really understand the case where there is any junk there so I don't know how realistic it is to optimize for some hypothesized worst case. I didn't benchmark this change but 30 seek()+read() syscalls in a row (which is the case for the tests/data/zip64_demo.zip) would be about 200µs (assuming the same overhead as the CentralDirectoryEnd code), which admittedly isn't much.

At any rate, I'm pretty sure the code is correct—I added unit tests for this change as well. These tests are pretty specific to buffer boundaries in the new code so I kept them together with this commit.

In summary, I'm a little more interested in the first 2 patches getting through. I believe the 3rd one (zip64) is technically correct but I can't vouch for it as much as the others.

Thanks for the great Rust library, btw!

Buffer the part we need to search over (last 64K of the file) so that
we don't do a 4 byte read 65536 times.
Read large chunks of data at a time to dramatically cut down on syscalls.
Add tests for various edge cases to validate.
caldwell added a commit to caldwell/syncron that referenced this pull request Dec 15, 2023
@Pr0methean
Copy link
Member

Because this repo is no longer maintained, I've replaced this PR with zip-rs/zip2#47.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants