Decoder eats whole input stream #14

jnvsor · 2015-02-21T10:39:35Z

As explained to me in http://stackoverflow.com/a/28641354/506962

However, it [ByRefReader] does not work when reading. It looks like reader::ZlibDecoder might consume all the way to the end of the underlying Reader. This could potentially be a bug or an oversight in the flate2 library.

The text was updated successfully, but these errors were encountered:

alexcrichton · 2015-02-23T07:18:37Z

Could you give a little more detail on this issue as well? Some code examples or file samples would go a long way for me :)

jnvsor · 2015-02-23T11:40:17Z

The following file is what I'm using to parse a git packfile (Any old git packfile should do)

use std::old_io::ByRefReader;
use std::num::FromPrimitive;
use flate2::reader::ZlibDecoder;

#[derive(FromPrimitive, Copy)]
enum GitObjectType {
    Commit = 1,
    Tree = 2,
    Blob = 3,
    Tag = 4,
    OfsDelta = 6,
    RefDelta = 7,
}

pub fn build_index(raw: Vec<u8>) {
    let mut data = &*raw;

    if data.read_exact(4).unwrap() != b"PACK" {
        panic!("Not a valid packfile");
    }

    let version = data.read_be_u32().unwrap();
    let objects = data.read_be_u32().unwrap();

    println!("Packfile version: {}", version);
    println!("Objects in packfile: {}", objects);

    while let Ok(mut c) = data.read_u8() {
        let obj_type: GitObjectType = FromPrimitive::from_u8((c >> 4) & 7).unwrap();
        let mut obj_size = (c & 0xf) as u64;
        let mut shift = 4;
        while c & 0x80 != 0 {
            c = data.read_u8().unwrap();
            obj_size += (c as u64 & 0x7f) << shift;
            shift += 7;
        }
        print!("Object type {} size {}", obj_type as u8, obj_size);

        println!(" data:\n{}",
            String::from_utf8(
                ZlibDecoder::new(ByRefReader::by_ref(&mut data)).read_exact(obj_size as usize).unwrap()
            ).unwrap()
        );
    }
}

Note that despite requesting read_exact, because the default buffer size is 128 * 1024 it will eat that many bytes from the input stream whether you told it to or not.

alexcrichton · 2015-02-24T00:36:50Z

Unfortunately I don't think there's really much that can be done about this. To read only the precisely correct amount of data from the underlying stream would either require feedback from miniz.c (not implemented) or reading only one byte at a time (not exactly efficient).

I think your best option here would be to use the new take adapter (or the old LimitReader) with an explicit size if you know it ahead of time.

jnvsor · 2015-02-24T00:43:54Z

Well that's the problem - with git pack files you only know the decompressed size ahead of time, not the input size

alexcrichton · 2015-02-24T00:51:04Z

In this case you would want to structure your code like:

let data = {
    let mut s = String::new();
    ZlibDecoder::new(data.by_ref().take(obj_size)).read_to_string(&mut s).unwrap();
    s
};

The call to take will ensure that the ZlibDecoder can't read more than obj_size bytes.

jnvsor · 2015-02-24T11:42:58Z

Again, I know the output size not the input size.

That example will take the output size (Too many) and decompress, after which there's no way to determine the offset to continue parsing the packfile from. (This is less an issue with flate2 than with the git packfile format)

feedback from miniz.c (not implemented) or reading only one byte at a time (not exactly efficient).

I'd like to request that feedback as a feature

alexcrichton · 2015-02-24T18:34:18Z

Ah sorry, I think I have indeed misunderstood.

alexcrichton · 2015-02-24T21:51:56Z

Oh one thing I just remembered is that you may want to try to use the flate2::write interfaces instead of the flate2::read interfaces. That should enable you to call write and the stream should not actually consume any bytes beyond the end (telling you precisely where the end is).

nwin · 2015-05-03T12:50:08Z

The main problem is that there is no feedback on how many bytes the decoder really needed from the underlying stream. So there is no possibility to rewind the reader if needed.

Ideally one would have an interface like this, then one could manage the buffering at the consumer side:

impl Decoder {
    // option 1
    fn decode<'a>(&'a mut self, input: &[u8]) -> io::Result<(usize, &'a [u8])> {
        ...
        Ok(consumed, &self.internal_buffer)
    }
    // option 2
    fn decode(&mut self, input: &[u8], output: &mut &mut [u8]) -> io::Result<(usize)> {
        ...
        Ok(consumed)
    }
}

Especially the second corresponds to the API already exposed by miniz (next_in, next_out, etc…).

nwin · 2015-05-03T14:53:23Z

I tried to experiement myself, the second variant (basic C pattern) is not expressible in safe Rust because you end up in some unbreakable borrow-cycle.

fn inflate<'a>(&mut self, input: &[u8], output: &'a mut [u8], flush: Flush) -> io::Result<(usize, &'a mut [u8])> ;

will do it.

nwin · 2015-05-03T17:00:17Z

@alexcrichton are you interested in including such an interface into flate2? If so I can clean up a little bit and file a PR for this: https://github.com/nwin/png/blob/master/src/deflate.rs

alexcrichton · 2015-05-04T16:46:18Z

Yeah I'd be totally fine exposing the raw miniz interface which doesn't go through Read or Write at all, it's actually basically already done! I'm not super happy with the API (which is why it's not public yet), but I've taken a similar route in the bzip-rs crate.

nwin · 2015-05-04T17:08:53Z

My concern with your interfaces is, that you have do to all the bookkeeping about how much has been written yourself (by monitoring the change in total_in/out after every function call)

I would either return this information

fn decompress(&mut self, input: &[u8], output: &mut [u8], flush: Flush) -> Result<(usize, usize)>;

update the slices

fn decompress<'a, 'b>(&mut self, input: &'a mut &'a[u8], output: &'b mut &'b mut [u8], flush: Flush) -> Result<()>;

or use a hybrid approach

fn decompress<'a>(&mut self, input: &[u8], output: &'a mut [u8], flush: Flush) -> Result<(usize, &'a mut [u8])>;

Probably the second approach is the best, but I’m afraid that it will cause most troubles with the borrow checker. The tuple in the first one is a bit ambiguous.

alexcrichton · 2015-05-04T18:27:44Z

I'd probably be in favor of just exposing the raw underlying interface as I also found it difficult to write a nice safe API on top, but I will admit I haven't given it too inordinate amounts of though :)

alexcrichton · 2015-09-09T18:39:14Z

Ok, I've finally gotten around to adding raw bindings to the in-memory streams which should provide a much greater level of control over things like buffering, and feel free to let me know about any API pain points!

alexcrichton closed this as completed Feb 24, 2015

alexcrichton reopened this Feb 24, 2015

alexcrichton mentioned this issue Jun 11, 2015

GZDecoder fails to decode #23

Closed

alexcrichton closed this as completed in 4414be4 Sep 9, 2015

alexcrichton mentioned this issue Oct 30, 2015

Reading self-terminating streams #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoder eats whole input stream #14

Decoder eats whole input stream #14

jnvsor commented Feb 21, 2015

alexcrichton commented Feb 23, 2015

jnvsor commented Feb 23, 2015

alexcrichton commented Feb 24, 2015

jnvsor commented Feb 24, 2015

alexcrichton commented Feb 24, 2015

jnvsor commented Feb 24, 2015

alexcrichton commented Feb 24, 2015

alexcrichton commented Feb 24, 2015

nwin commented May 3, 2015

nwin commented May 3, 2015

nwin commented May 3, 2015

alexcrichton commented May 4, 2015

nwin commented May 4, 2015

alexcrichton commented May 4, 2015

alexcrichton commented Sep 9, 2015

Decoder eats whole input stream #14

Decoder eats whole input stream #14

Comments

jnvsor commented Feb 21, 2015

alexcrichton commented Feb 23, 2015

jnvsor commented Feb 23, 2015

alexcrichton commented Feb 24, 2015

jnvsor commented Feb 24, 2015

alexcrichton commented Feb 24, 2015

jnvsor commented Feb 24, 2015

alexcrichton commented Feb 24, 2015

alexcrichton commented Feb 24, 2015

nwin commented May 3, 2015

nwin commented May 3, 2015

nwin commented May 3, 2015

alexcrichton commented May 4, 2015

nwin commented May 4, 2015

alexcrichton commented May 4, 2015

alexcrichton commented Sep 9, 2015