Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: parse headers in blocks and scan for magic numbers with memchr #93

Merged
merged 50 commits into from
May 25, 2024
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
7a55945
add benchmarks
cosmicexplorer May 3, 2024
0a573d3
make benchmarks report bytes/second
cosmicexplorer May 18, 2024
3d1728d
add stream benchmark
cosmicexplorer May 18, 2024
011e5af
add test that breaks without the fix
cosmicexplorer May 3, 2024
ea30849
bulk parsing and bulk writing
cosmicexplorer May 3, 2024
ad1d51d
write file comment to central directory header
cosmicexplorer May 13, 2024
46c42c7
review comments 1
cosmicexplorer May 18, 2024
3fa0d84
make Magic into a wrapper struct
cosmicexplorer May 18, 2024
08385d5
implement find_content() by parsing with blocks
cosmicexplorer May 18, 2024
7eb5907
remove a lot of boilerplate for Block impls
cosmicexplorer May 18, 2024
83cdbad
make window size assertions much less complex with Magic
cosmicexplorer May 18, 2024
03c92a1
add to_and_from_le! macro
cosmicexplorer May 18, 2024
e1c92e2
make SIG_BYTES const
cosmicexplorer May 18, 2024
cf2d980
expose pub(crate) methods to convert compression methods
cosmicexplorer May 18, 2024
41813d2
move encrypted and data descriptor validation up higher
cosmicexplorer May 18, 2024
8fbc403
lean more on the ::MAGIC trait constants
cosmicexplorer May 18, 2024
acb0a6f
clarify the check being performed
cosmicexplorer May 18, 2024
3d6c4d1
fix fuzz failure
cosmicexplorer May 18, 2024
21d07e1
add ExtraFieldMagic and Zip64ExtraFieldBlock
cosmicexplorer May 18, 2024
8d454d2
nitpick
cosmicexplorer May 18, 2024
a7fd587
reduce visibility for all the blocks
cosmicexplorer May 18, 2024
d852c22
review comments 1
cosmicexplorer May 22, 2024
79b96bd
add "std" feature to getrandom for io::Error conversion
cosmicexplorer May 22, 2024
7c2474f
go into_boxed_slice() earlier
cosmicexplorer May 22, 2024
0b31d98
review comments 2
cosmicexplorer May 22, 2024
4a784b5
interpose ZipRawValues into ZipFileData
cosmicexplorer May 22, 2024
fe663b9
tiny fix
cosmicexplorer May 22, 2024
a769e94
Revert "interpose ZipRawValues into ZipFileData"
cosmicexplorer May 24, 2024
80ca254
fix doc comments
cosmicexplorer May 24, 2024
a509efc
review comments 3
cosmicexplorer May 24, 2024
8e5b157
fix stream benchmark
cosmicexplorer May 24, 2024
d81382b
revert limit for search_lower_bound to fix benchmark
cosmicexplorer May 24, 2024
ed1d38f
Run bench only once for each random input
Pr0methean May 24, 2024
9722dd3
Return error if file comment is too long
Pr0methean May 24, 2024
848309a
Switch to debug_assert! for an assert! involving only constants
Pr0methean May 24, 2024
18760e9
Switch to debug_assert! for an assert! involving only constants
Pr0methean May 24, 2024
2a39a8e
Fix an off-by-one error in large-file detection
Pr0methean May 24, 2024
a4915fd
Fix a bug in benchmark: closure needs a parameter
Pr0methean May 24, 2024
1bb0b14
style: Fix cargo fmt check
Pr0methean May 24, 2024
f90bdf7
Fix an off-by-one error in large-file detection
Pr0methean May 24, 2024
d63ad8e
Merge branch 'master' into bulk-parsing
Pr0methean May 24, 2024
3ab9f45
Bug fix: `bench_n` expects empty return
Pr0methean May 24, 2024
a462b85
Fix an off-by-one error in large-file detection
Pr0methean May 24, 2024
5e216fe
Bug fix: len() is `must-use`
Pr0methean May 24, 2024
01bb162
Remove an unused macro branch
Pr0methean May 24, 2024
3af7017
Remove an unused macro branch
Pr0methean May 24, 2024
326b2c4
Revert macro changes
Pr0methean May 24, 2024
6b19c87
Merge branch 'master' into bulk-parsing
Pr0methean May 24, 2024
df70f6a
Fix unmatched bracket due to bad merge
Pr0methean May 24, 2024
a28b16e
Apply suggestions from code review
Pr0methean May 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ displaydoc = { version = "0.2.4", default-features = false }
flate2 = { version = "1.0.28", default-features = false, optional = true }
indexmap = "2"
hmac = { version = "0.12.1", optional = true, features = ["reset"] }
memchr = "2.7.2"
pbkdf2 = { version = "0.12.2", optional = true }
rand = { version = "0.8.5", optional = true }
sha1 = { version = "0.10.6", optional = true }
Expand All @@ -56,7 +57,7 @@ arbitrary = { version = "1.3.2", features = ["derive"] }

[dev-dependencies]
bencher = "0.1.5"
getrandom = { version = "0.2.14", features = ["js"] }
getrandom = { version = "0.2.14", features = ["js", "std"] }
walkdir = "2.5.0"
time = { workspace = true, features = ["formatting", "macros"] }
anyhow = "1"
Expand Down
106 changes: 97 additions & 9 deletions benches/read_metadata.rs
Original file line number Diff line number Diff line change
@@ -1,38 +1,126 @@
use bencher::{benchmark_group, benchmark_main};

use std::io::{Cursor, Write};
use std::fs;
use std::io::{self, prelude::*, Cursor};

use bencher::Bencher;
use getrandom::getrandom;
use tempdir::TempDir;
use zip::write::SimpleFileOptions;
use zip::{CompressionMethod, ZipArchive, ZipWriter};
use zip::{result::ZipResult, CompressionMethod, ZipArchive, ZipWriter};

const FILE_COUNT: usize = 15_000;
const FILE_SIZE: usize = 1024;

fn generate_random_archive(count_files: usize, file_size: usize) -> Vec<u8> {
fn generate_random_archive(count_files: usize, file_size: usize) -> ZipResult<Vec<u8>> {
let data = Vec::new();
let mut writer = ZipWriter::new(Cursor::new(data));
let options = SimpleFileOptions::default().compression_method(CompressionMethod::Stored);

let bytes = vec![0u8; file_size];
let mut bytes = vec![0u8; file_size];

for i in 0..count_files {
let name = format!("file_deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef_{i}.dat");
writer.start_file(name, options).unwrap();
writer.write_all(&bytes).unwrap();
writer.start_file(name, options)?;
getrandom(&mut bytes).map_err(io::Error::from)?;
writer.write_all(&bytes)?;
}

writer.finish().unwrap().into_inner()
Ok(writer.finish()?.into_inner())
}

fn read_metadata(bench: &mut Bencher) {
let bytes = generate_random_archive(FILE_COUNT, FILE_SIZE);
let bytes = generate_random_archive(FILE_COUNT, FILE_SIZE).unwrap();
cosmicexplorer marked this conversation as resolved.
Show resolved Hide resolved

bench.iter(|| {
let archive = ZipArchive::new(Cursor::new(bytes.as_slice())).unwrap();
archive.len()
});
bench.bytes = bytes.len() as u64;
}

benchmark_group!(benches, read_metadata);
const COMMENT_SIZE: usize = 50_000;

fn generate_zip32_archive_with_random_comment(comment_length: usize) -> ZipResult<Vec<u8>> {
let data = Vec::new();
let mut writer = ZipWriter::new(Cursor::new(data));
let options = SimpleFileOptions::default().compression_method(CompressionMethod::Stored);

let mut bytes = vec![0u8; comment_length];
getrandom(&mut bytes).unwrap();
writer.set_raw_comment(bytes.into_boxed_slice());

writer.start_file("asdf.txt", options)?;
writer.write_all(b"asdf")?;

Ok(writer.finish()?.into_inner())
}

fn parse_archive_with_comment(bench: &mut Bencher) {
let bytes = generate_zip32_archive_with_random_comment(COMMENT_SIZE).unwrap();

bench.bench_n(1, |_| {
let archive = ZipArchive::new(Cursor::new(bytes.as_slice())).unwrap();
archive.comment().len();

Check failure on line 64 in benches/read_metadata.rs

View workflow job for this annotation

GitHub Actions / style_and_docs (--no-default-features)

unused return value of `core::slice::<impl [T]>::len` that must be used
Pr0methean marked this conversation as resolved.
Show resolved Hide resolved
});
bench.bytes = bytes.len() as u64;
}

const COMMENT_SIZE_64: usize = 500_000;

fn generate_zip64_archive_with_random_comment(comment_length: usize) -> ZipResult<Vec<u8>> {
let data = Vec::new();
let mut writer = ZipWriter::new(Cursor::new(data));
let options = SimpleFileOptions::default()
.compression_method(CompressionMethod::Stored)
.large_file(true);

let mut bytes = vec![0u8; comment_length];
getrandom(&mut bytes).unwrap();
writer.set_raw_comment(bytes.into_boxed_slice());

writer.start_file("asdf.txt", options)?;
writer.write_all(b"asdf")?;

Ok(writer.finish()?.into_inner())
}

fn parse_zip64_archive_with_comment(bench: &mut Bencher) {
let bytes = generate_zip64_archive_with_random_comment(COMMENT_SIZE_64).unwrap();

bench.iter(|| {
let archive = ZipArchive::new(Cursor::new(bytes.as_slice())).unwrap();
archive.comment().len()
});
bench.bytes = bytes.len() as u64;
}

fn parse_stream_archive(bench: &mut Bencher) {
const STREAM_ZIP_ENTRIES: usize = 5;
const STREAM_FILE_SIZE: usize = 5;

let bytes = generate_random_archive(STREAM_ZIP_ENTRIES, STREAM_FILE_SIZE).unwrap();

/* Write to a temporary file path to incur some filesystem overhead from repeated reads */
let dir = TempDir::new("stream-bench").unwrap();
let out = dir.path().join("bench-out.zip");
fs::write(&out, &bytes).unwrap();

bench.iter(|| {
let mut f = fs::File::open(&out).unwrap();
while zip::read::read_zipfile_from_stream(&mut f)
.unwrap()
.is_some()
{}
});
bench.bytes = bytes.len() as u64;
}

benchmark_group!(
benches,
read_metadata,
parse_archive_with_comment,
parse_zip64_archive_with_comment,
parse_stream_archive,
);
benchmark_main!(benches);
47 changes: 24 additions & 23 deletions src/compression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,13 +90,7 @@ impl CompressionMethod {
pub const AES: Self = CompressionMethod::Unsupported(99);
}
impl CompressionMethod {
/// Converts an u16 to its corresponding CompressionMethod
#[deprecated(
since = "0.5.7",
note = "use a constant to construct a compression method"
)]
pub const fn from_u16(val: u16) -> CompressionMethod {
#[allow(deprecated)]
pub(crate) const fn parse_from_u16(val: u16) -> Self {
match val {
0 => CompressionMethod::Stored,
#[cfg(feature = "_deflate-any")]
Expand All @@ -111,18 +105,21 @@ impl CompressionMethod {
93 => CompressionMethod::Zstd,
#[cfg(feature = "aes-crypto")]
99 => CompressionMethod::Aes,

#[allow(deprecated)]
v => CompressionMethod::Unsupported(v),
}
}

/// Converts a CompressionMethod to a u16
/// Converts a u16 to its corresponding CompressionMethod
#[deprecated(
since = "0.5.7",
note = "to match on other compression methods, use a constant"
note = "use a constant to construct a compression method"
)]
pub const fn to_u16(self) -> u16 {
#[allow(deprecated)]
pub const fn from_u16(val: u16) -> CompressionMethod {
Self::parse_from_u16(val)
}

pub(crate) const fn serialize_to_u16(self) -> u16 {
match self {
CompressionMethod::Stored => 0,
#[cfg(feature = "_deflate-any")]
Expand All @@ -137,10 +134,19 @@ impl CompressionMethod {
CompressionMethod::Zstd => 93,
#[cfg(feature = "lzma")]
CompressionMethod::Lzma => 14,

#[allow(deprecated)]
CompressionMethod::Unsupported(v) => v,
}
}

/// Converts a CompressionMethod to a u16
#[deprecated(
since = "0.5.7",
note = "to match on other compression methods, use a constant"
)]
pub const fn to_u16(self) -> u16 {
self.serialize_to_u16()
}
}

impl Default for CompressionMethod {
Expand Down Expand Up @@ -180,23 +186,18 @@ mod test {
#[test]
fn from_eq_to() {
for v in 0..(u16::MAX as u32 + 1) {
#[allow(deprecated)]
let from = CompressionMethod::from_u16(v as u16);
#[allow(deprecated)]
let to = from.to_u16() as u32;
let from = CompressionMethod::parse_from_u16(v as u16);
let to = from.serialize_to_u16() as u32;
assert_eq!(v, to);
}
}

#[test]
fn to_eq_from() {
fn check_match(method: CompressionMethod) {
#[allow(deprecated)]
let to = method.to_u16();
#[allow(deprecated)]
let from = CompressionMethod::from_u16(to);
#[allow(deprecated)]
let back = from.to_u16();
let to = method.serialize_to_u16();
let from = CompressionMethod::parse_from_u16(to);
let back = from.serialize_to_u16();
assert_eq!(to, back);
}

Expand Down
Loading
Loading