Skip to content
This repository has been archived by the owner on Jun 2, 2024. It is now read-only.

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Laziness of ZipArchive and ZiipFile make for an awkward API #147

Closed
rylev opened this issue Jun 16, 2020 · 9 comments
Closed

Laziness of ZipArchive and ZiipFile make for an awkward API #147

rylev opened this issue Jun 16, 2020 · 9 comments
Labels
Milestone

Comments

@rylev
Copy link
Contributor

rylev commented Jun 16, 2020

Both ZipArchive and ZipFile are lazy, and only read from the underlying Reader when they absolutely have to. This is great for performance as zip archives can be very large, and we don't want to eagerly uncompress and read everything into memory. However, there are a few ways this makes the API a bit awkward:

ZipArchive::{by_name,by_index} take &mut self

This is because these methods mutate the underlying reader. Ultimately this is not an ease fix as the laziness of both ZipArchive and ZipFile do not really allow us to change this to &self by using interior mutability (through use of something like RefCell). Most of the time having exclusive access of the ZipArchive is not an issue, but it can be confusing for newcomers who may not understand why the ZipArchive needs mutable/exclusive access for reading files.

Iterator APIs are not supported

Currently the ZipArchive does not support iterators, requiring the user to get files by name or by index. Supporting iteration is not that easy as iter() normally takes &self and yielding ZipFiles from the iterator causes lifetime issues since ZipFiles are attached to the lifetime of the underlying reader.

Question: what should we do about this? Can we keep the performance of the current API without some of these awkward parts?

@rylev rylev added the question label Jun 16, 2020
@Plecra
Copy link
Member

Plecra commented Jun 16, 2020

I think we should be able to solve the first issue with better naming: building a zip::Reader from your io::Read would make it fairly intuitive that it needs similar access to io::Read::read.

There are a few options for a more intuitive API. I've been mocking up some alternatives, but they're basically all internal iterators. We'll probably end up with something like this:

for line in archive.flat_map(|file| Ok(if file.name.extension() == "log" {
    Some(io::BufReader::new(file).lines())
} else {
    None
})) {
  println!("{}", line);
}

When it's eventually stabilized, we should definitely take the opportunity to implement StreamingIterator too.

@Plecra Plecra added this to the Release 0.6 milestone Jun 16, 2020
@rylev rylev mentioned this issue Jun 23, 2020
@MaulingMonkey
Copy link

Ran into this when implementing vfs-zip, since vfs filesystems take &self and assume everything is Sync. Ultimately this boils down to read/write relying on and modifying the seek position of the underlying shared io. There's a relatively easy fix for readers: std::os::unix::fs::FileExt::read_at and std::os::windows::fs::FileExt::seek_read both take File by &self - no mut required. Of course, it'd be better to take a general purpouse trait. People have written a couple:

Writers are harder - despite the existence of write_at, I assume we can't precalculate the length of a file write until compressed, which makes writing multiple files simultaniously a nonstarter.

Iterators

Some iterators can take self too, which would be another option.

@Plecra
Copy link
Member

Plecra commented Sep 9, 2020

What we can do while writing is reserve a specific compressed length, and then prevent the user from writing past it. It'll just be difficult to write generically without using locks.

We do have the option of simply implementing iter with shared references (my current approach in 0.7). &Files implement Read, and the user would just have to be careful to avoid logic errors. This kinda sucks, but is also the only simple cross platform solution.

If we're being clever anyway (like that read_at), I think our best option is to clone the os handles, which could give us multiple cursors into the file.

The final option worth considering is a totally custom set of traits - &[u8] technically wouldn't need exclusive access to be read, and it'd be possible to implement a trait that expressed that.

@ferk6a
Copy link

ferk6a commented Oct 5, 2020

I'm quite new to Rust, how can I use both by_name and file_names in the current implementation? I came up with this:

for name in zip.file_names().map(|s| s.to_string()).collect::<Vec<_>>() {
    zip.by_name(name);
}

But indeed it looks quite ugly.

Perhaps a file_names_mut could be implemented to make this better? I saw that slices have iter_mut, but I don't know if the pattern applies here.

@Plecra
Copy link
Member

Plecra commented Oct 5, 2020

@fer22f The ZipFile documentation has an example of iterating through files and their names - is there something more you need?

@ferk6a
Copy link

ferk6a commented Oct 5, 2020

@fer22f The ZipFile documentation has an example of iterating through files and their names - is there something more you need?

Ah yes, using indexes, you're right. I will use that instead. But I still wish we could do the same with strings directly, I use a filter after the map, and using that implementation I would need to move it to inside the for.

@Plecra
Copy link
Member

Plecra commented Nov 10, 2020

Just for reference, it's looking like the new API will allow iteration like this:

for file in zip.iter().filter(|file| file.name().to_str().filter(condition).is_some()) {
    // use `file`
}

However, I'm still not sure how a consuming iterator could work. IntoIterator could Clone the internal file, but it'd make iterating through a std::fs::File ZIP fairly unintuitive:

// notice the `&`
for file in Archive::open(&File::open(path)?)? {
    // move path and metadata out of `file`
}

@smwoj
Copy link

smwoj commented Jun 5, 2021

I'm interested in the iterator API as well. I saw the discussion in #232 - but does the planned development includes this feature? Would you mind sharing its status?

(btw thanks for the work put into this crate!)

@Plecra
Copy link
Member

Plecra commented Feb 1, 2023

It's not much, but I'll say the status on everything here is "WIP" :D I'd really love to see a more decoupled API, and it's still on the horizon. I'm going to work with zamazan on smoothing out 0.6 before any of that

@zip-rs zip-rs locked and limited conversation to collaborators Feb 1, 2023
@Plecra Plecra converted this issue into discussion #344 Feb 1, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

5 participants