Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reader out of bounds #9

Merged
merged 4 commits into from
Jan 21, 2020
Merged

Fix reader out of bounds #9

merged 4 commits into from
Jan 21, 2020

Conversation

j-vizcaino
Copy link

@j-vizcaino j-vizcaino commented Jan 20, 2020

The first commit illustrates the issue (see CI: https://circleci.com/gh/DataDog/golz4/36).

When decompressing data, size of dst was not checked and the slice taken for writing turns into an out of bound panic.

The fix relies on a separate buffer to hold the decompressed bytes that cannot be written right away.
The implementation is fairly straightforward: on Read() either consume from the internal buffer or decompress more data.

The input data needs to be at least 512 bytes long.
Use an internal buffer to store decompressed data between Read() calls
when dst buffer is not big enough.
Copy link
Member

@HippoBaro HippoBaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! However, IMO passing a read buffer of the right size is the responsibility of the caller.

For example on the write path we do

golz4/lz4.go

Lines 84 to 86 in 4bea358

if len(src) > streamingBlockSize+4 {
return 0, fmt.Errorf("block is too large: %d > %d", len(src), streamingBlockSize+4)
}

Your solution adds pure overheads for callers that do the right thing.

@j-vizcaino
Copy link
Author

@HippoBaro Thanks for the review but I disagree with your point: there are lots of cases where the user has no control over the size of the buffer that is passed to the Read function. Just have a look at the unit test in 3a126b7: bytes.Buffer reads from the LZ4 reader and we have no control over the size of the buffer.
Here is another real-world scenario that illustrates this: https://github.com/DataDog/dd-go/pull/17942/commits/1a38124a4c2a92f4d46c8b7bac18b14e6c878e33#diff-a48157fe8a7dcfeb8fa1bc89948fea81R17

Furthermore, I don't think the performance argument stands either, because, when the caller provides a buffer that is big enough, no pending slice gets created nor used. Unless I missed something, the current implementation does not apply extra data copied, compared to the previous one.

FWIW, I first tried to patch the Read function to make it return io.ErrShortBuffer, as I thought this would make the caller grow the buffer and try again but it turns out this is not correctly handled (by bytes.Buffer, at least).

@HippoBaro
Copy link
Member

I'm not very opinionated on the topic, but I would use a buffered reader on the client side in this case.

https://golang.org/pkg/bufio/#NewReaderSize

@j-vizcaino
Copy link
Author

While using a buffered reader would probably help, I do not think that having the reader know the max read size beforehand is a good pattern.
How is the reader supposed to know the max size a given inflate operation can return anyway?
And even if it did based on how the bytestream was written, it feels like coupling two components or apps for no obvious reason, making future changes harder than it should.

@HippoBaro
Copy link
Member

That's a fair point; my approach comes from using the lz4 library itself. It solves this problem by providing a function returning a buffer length to external users.

https://github.com/lz4/lz4/blob/c7ad96e299545330617e95eebc1369edd4e5fdf0/lib/lz4.h#L173-L182

I misread the code btw, you're not adding overhead; apologies.

@j-vizcaino
Copy link
Author

That's a fair point; my approach comes from using the lz4 library itself. It solves this problem by providing a function returning a buffer length to external users.

https://github.com/lz4/lz4/blob/c7ad96e299545330617e95eebc1369edd4e5fdf0/lib/lz4.h#L173-L182

Thanks for the pointer but this function is only meant to be used when compressing (it is already provided by https://github.com/DataDog/golz4/blob/master/lz4.go#L50 FWIW).

This PR is about reads, not writes ;)

@HippoBaro
Copy link
Member

All good then; your code works and does the job. :shipit: !

Copy link
Collaborator

@zzzzssss zzzzssss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix gives the lz4 reader a more general usage, without letting the caller worrying about the size for the block size, so go ahead

@j-vizcaino j-vizcaino merged commit bdbee0b into master Jan 21, 2020
@j-vizcaino j-vizcaino deleted the fix-reader-out-of-bounds branch January 21, 2020 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants