Fix reader out of bounds #9

j-vizcaino · 2020-01-20T16:38:14Z

The first commit illustrates the issue (see CI: https://circleci.com/gh/DataDog/golz4/36).

When decompressing data, size of dst was not checked and the slice taken for writing turns into an out of bound panic.

The fix relies on a separate buffer to hold the decompressed bytes that cannot be written right away.
The implementation is fairly straightforward: on Read() either consume from the internal buffer or decompress more data.

The input data needs to be at least 512 bytes long.

Use an internal buffer to store decompressed data between Read() calls when dst buffer is not big enough.

HippoBaro

Nice catch! However, IMO passing a read buffer of the right size is the responsibility of the caller.

For example on the write path we do

golz4/lz4.go

Lines 84 to 86 in 4bea358

    
           if len(src) > streamingBlockSize+4 { 
        
           	return 0, fmt.Errorf("block is too large: %d > %d", len(src), streamingBlockSize+4) 
        
           }

Your solution adds pure overheads for callers that do the right thing.

j-vizcaino · 2020-01-21T09:16:22Z

@HippoBaro Thanks for the review but I disagree with your point: there are lots of cases where the user has no control over the size of the buffer that is passed to the Read function. Just have a look at the unit test in 3a126b7: bytes.Buffer reads from the LZ4 reader and we have no control over the size of the buffer.
Here is another real-world scenario that illustrates this: https://github.com/DataDog/dd-go/pull/17942/commits/1a38124a4c2a92f4d46c8b7bac18b14e6c878e33#diff-a48157fe8a7dcfeb8fa1bc89948fea81R17

Furthermore, I don't think the performance argument stands either, because, when the caller provides a buffer that is big enough, no pending slice gets created nor used. Unless I missed something, the current implementation does not apply extra data copied, compared to the previous one.

FWIW, I first tried to patch the Read function to make it return io.ErrShortBuffer, as I thought this would make the caller grow the buffer and try again but it turns out this is not correctly handled (by bytes.Buffer, at least).

HippoBaro · 2020-01-21T15:45:20Z

I'm not very opinionated on the topic, but I would use a buffered reader on the client side in this case.

https://golang.org/pkg/bufio/#NewReaderSize

j-vizcaino · 2020-01-21T16:00:20Z

While using a buffered reader would probably help, I do not think that having the reader know the max read size beforehand is a good pattern.
How is the reader supposed to know the max size a given inflate operation can return anyway?
And even if it did based on how the bytestream was written, it feels like coupling two components or apps for no obvious reason, making future changes harder than it should.

HippoBaro · 2020-01-21T16:14:42Z

That's a fair point; my approach comes from using the lz4 library itself. It solves this problem by providing a function returning a buffer length to external users.

https://github.com/lz4/lz4/blob/c7ad96e299545330617e95eebc1369edd4e5fdf0/lib/lz4.h#L173-L182

I misread the code btw, you're not adding overhead; apologies.

j-vizcaino · 2020-01-21T16:23:41Z

That's a fair point; my approach comes from using the lz4 library itself. It solves this problem by providing a function returning a buffer length to external users.

https://github.com/lz4/lz4/blob/c7ad96e299545330617e95eebc1369edd4e5fdf0/lib/lz4.h#L173-L182

Thanks for the pointer but this function is only meant to be used when compressing (it is already provided by https://github.com/DataDog/golz4/blob/master/lz4.go#L50 FWIW).

This PR is about reads, not writes ;)

HippoBaro · 2020-01-21T16:33:27Z

All good then; your code works and does the job. !

zzzzssss

I think the fix gives the lz4 reader a more general usage, without letting the caller worrying about the size for the block size, so go ahead

j-vizcaino added 4 commits January 20, 2020 16:40

Update test to make the reader crash

3a126b7

The input data needs to be at least 512 bytes long.

Fix reader out of bounds

7b1662f

Use an internal buffer to store decompressed data between Read() calls when dst buffer is not big enough.

Typo bouNdedStreamingBlockSize

78be8c3

Add comment explaining why the double buffer is needed

e6fec73

HippoBaro suggested changes Jan 20, 2020

View reviewed changes

zzzzssss approved these changes Jan 21, 2020

View reviewed changes

j-vizcaino merged commit bdbee0b into master Jan 21, 2020

j-vizcaino deleted the fix-reader-out-of-bounds branch January 21, 2020 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reader out of bounds #9

Fix reader out of bounds #9

j-vizcaino commented Jan 20, 2020 •

edited

Loading

HippoBaro left a comment

j-vizcaino commented Jan 21, 2020

HippoBaro commented Jan 21, 2020

j-vizcaino commented Jan 21, 2020

HippoBaro commented Jan 21, 2020

j-vizcaino commented Jan 21, 2020

HippoBaro commented Jan 21, 2020

zzzzssss left a comment

	if len(src) > streamingBlockSize+4 {
	return 0, fmt.Errorf("block is too large: %d > %d", len(src), streamingBlockSize+4)
	}

Fix reader out of bounds #9

Fix reader out of bounds #9

Conversation

j-vizcaino commented Jan 20, 2020 • edited Loading

HippoBaro left a comment

Choose a reason for hiding this comment

j-vizcaino commented Jan 21, 2020

HippoBaro commented Jan 21, 2020

j-vizcaino commented Jan 21, 2020

HippoBaro commented Jan 21, 2020

j-vizcaino commented Jan 21, 2020

HippoBaro commented Jan 21, 2020

zzzzssss left a comment

Choose a reason for hiding this comment

j-vizcaino commented Jan 20, 2020 •

edited

Loading