Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading self-terminating streams #28

Closed
joshuawarner32 opened this issue Oct 30, 2015 · 2 comments
Closed

Reading self-terminating streams #28

joshuawarner32 opened this issue Oct 30, 2015 · 2 comments

Comments

@joshuawarner32
Copy link
Contributor

First off, awesome library, thanks!

My use-case: reading git pack files, which contain multiple concatenated, yet undelimited, zlib/deflate streams of data. In other words, the pack files don't contain any information about the length of the streams - so to detect where the start of the next object in the pack is, I have to know exactly how many bytes the underlying deflate implementation consumed, so I can know where in the underlying file to start reading the next segment of data from.

This poses two separate problems:

  • Constructing a ZlibDecoder takes ownership of the underlying file stream, which means I have to re-open the pack file to read the next object from it. This is an annoyance, but it works fine for my scenario.
  • I need to get the number of bytes the ZlibDecoder consumed (which is likely less that the number of bytes it read off the underlying stream, due to buffering).

In an ideal world, I think ZlibDecoder would only take a &mut reference to the underlying Read, and by some magic, when it's destroyed, it leaves that underlying stream positioned at the exact end of zlib data. This will likely involve requiring the underling stream to also implement Seek, which in turn either requires code duplication in the API (so far as I am aware), or imposes unwanted restrictions on everyone else. I don't think that's realistic, so let's move on to option 2:

Make ZlibDecoder take a &mut reference to the underlying Read. It does nothing special when it's destroyed, but it exposes an extra zlibDecoder.consumed_bytes() method (or field, or whatever), calculated from the total bytes it's read, less what's remaining in miniz's input buffer.

The last option for me is to scrap the higher-level API and directly use your miniz-sys bindings, which is ugly for me, but less ugly for everyone else.

Thoughts?

@alexcrichton
Copy link
Member

Thanks for the report! This came up in the past with #14 (I think with even the same use case!), which eventually prompted the creation of the Decompress struct for dealing with raw in-memory decompression (e.g. no extra buffering). That being said I can see where it's much nicer to use the stream API, so I'd be totally down for beefing it up!

First, although all streams and such have R: Read as a type parameter, &mut R also satisfies this which means you don't actually have to pass ownership of the file into the deflate streams. Instead you can pass a mutable reference and then once the deflate stream is destroyed you can continue to use the underlying stream (e.g. seek it to the right position and whatnot).

I agree that dealing with Seek directly would be a little unfortunate, so I think a good way to move forward here would be to expose a method like consumed_bytes you mentioned. Does that sound ok for your use case?

@joshuawarner32
Copy link
Contributor Author

I think with even the same use case.

Yep, indeed. I'm glad other people are thinking along the same lines; it means I'm not totally crazy :)

First, although all streams and such have R: Read as a type parameter, &mut R also satisfies this which means you don't actually have to pass ownership of the file into the deflate streams.

Ah, perfect! Today I learned...

Does that sound ok for your use case?

Yep, that'll do nicely. I'll take a stab at putting that together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants