-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reader out of bounds #9
Conversation
The input data needs to be at least 512 bytes long.
Use an internal buffer to store decompressed data between Read() calls when dst buffer is not big enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! However, IMO passing a read buffer of the right size is the responsibility of the caller.
For example on the write path we do
Lines 84 to 86 in 4bea358
if len(src) > streamingBlockSize+4 { | |
return 0, fmt.Errorf("block is too large: %d > %d", len(src), streamingBlockSize+4) | |
} |
Your solution adds pure overheads for callers that do the right thing.
@HippoBaro Thanks for the review but I disagree with your point: there are lots of cases where the user has no control over the size of the buffer that is passed to the Read function. Just have a look at the unit test in 3a126b7: bytes.Buffer reads from the LZ4 reader and we have no control over the size of the buffer. Furthermore, I don't think the performance argument stands either, because, when the caller provides a buffer that is big enough, no pending slice gets created nor used. Unless I missed something, the current implementation does not apply extra data copied, compared to the previous one. FWIW, I first tried to patch the Read function to make it return io.ErrShortBuffer, as I thought this would make the caller grow the buffer and try again but it turns out this is not correctly handled (by bytes.Buffer, at least). |
I'm not very opinionated on the topic, but I would use a buffered reader on the client side in this case. |
While using a buffered reader would probably help, I do not think that having the reader know the max read size beforehand is a good pattern. |
That's a fair point; my approach comes from using the lz4 library itself. It solves this problem by providing a function returning a buffer length to external users. https://github.com/lz4/lz4/blob/c7ad96e299545330617e95eebc1369edd4e5fdf0/lib/lz4.h#L173-L182 I misread the code btw, you're not adding overhead; apologies. |
Thanks for the pointer but this function is only meant to be used when compressing (it is already provided by https://github.com/DataDog/golz4/blob/master/lz4.go#L50 FWIW). This PR is about reads, not writes ;) |
All good then; your code works and does the job. ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fix gives the lz4 reader a more general usage, without letting the caller worrying about the size for the block size, so go ahead
The first commit illustrates the issue (see CI: https://circleci.com/gh/DataDog/golz4/36).
When decompressing data, size of
dst
was not checked and the slice taken for writing turns into an out of bound panic.The fix relies on a separate buffer to hold the decompressed bytes that cannot be written right away.
The implementation is fairly straightforward: on Read() either consume from the internal buffer or decompress more data.