-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoder eats whole input stream #14
Comments
Could you give a little more detail on this issue as well? Some code examples or file samples would go a long way for me :) |
The following file is what I'm using to parse a git packfile (Any old git packfile should do)
Note that despite requesting |
Unfortunately I don't think there's really much that can be done about this. To read only the precisely correct amount of data from the underlying stream would either require feedback from miniz.c (not implemented) or reading only one byte at a time (not exactly efficient). I think your best option here would be to use the new |
Well that's the problem - with git pack files you only know the decompressed size ahead of time, not the input size |
In this case you would want to structure your code like: let data = {
let mut s = String::new();
ZlibDecoder::new(data.by_ref().take(obj_size)).read_to_string(&mut s).unwrap();
s
}; The call to |
Again, I know the output size not the input size. That example will take the output size (Too many) and decompress, after which there's no way to determine the offset to continue parsing the packfile from. (This is less an issue with flate2 than with the git packfile format)
I'd like to request that feedback as a feature |
Ah sorry, I think I have indeed misunderstood. |
Oh one thing I just remembered is that you may want to try to use the |
The main problem is that there is no feedback on how many bytes the decoder really needed from the underlying stream. So there is no possibility to rewind the reader if needed. Ideally one would have an interface like this, then one could manage the buffering at the consumer side: impl Decoder {
// option 1
fn decode<'a>(&'a mut self, input: &[u8]) -> io::Result<(usize, &'a [u8])> {
...
Ok(consumed, &self.internal_buffer)
}
// option 2
fn decode(&mut self, input: &[u8], output: &mut &mut [u8]) -> io::Result<(usize)> {
...
Ok(consumed)
}
} Especially the second corresponds to the API already exposed by miniz (next_in, next_out, etc…). |
I tried to experiement myself, the second variant (basic C pattern) is not expressible in safe Rust because you end up in some unbreakable borrow-cycle. fn inflate<'a>(&mut self, input: &[u8], output: &'a mut [u8], flush: Flush) -> io::Result<(usize, &'a mut [u8])> ; will do it. |
@alexcrichton are you interested in including such an interface into flate2? If so I can clean up a little bit and file a PR for this: https://github.com/nwin/png/blob/master/src/deflate.rs |
Yeah I'd be totally fine exposing the raw miniz interface which doesn't go through |
My concern with your interfaces is, that you have do to all the bookkeeping about how much has been written yourself (by monitoring the change in I would either return this information fn decompress(&mut self, input: &[u8], output: &mut [u8], flush: Flush) -> Result<(usize, usize)>; update the slices fn decompress<'a, 'b>(&mut self, input: &'a mut &'a[u8], output: &'b mut &'b mut [u8], flush: Flush) -> Result<()>; or use a hybrid approach fn decompress<'a>(&mut self, input: &[u8], output: &'a mut [u8], flush: Flush) -> Result<(usize, &'a mut [u8])>; Probably the second approach is the best, but I’m afraid that it will cause most troubles with the borrow checker. The tuple in the first one is a bit ambiguous. |
I'd probably be in favor of just exposing the raw underlying interface as I also found it difficult to write a nice safe API on top, but I will admit I haven't given it too inordinate amounts of though :) |
Ok, I've finally gotten around to adding raw bindings to the in-memory streams which should provide a much greater level of control over things like buffering, and feel free to let me know about any API pain points! |
As explained to me in http://stackoverflow.com/a/28641354/506962
The text was updated successfully, but these errors were encountered: