Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for possible offset skew #1037

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions consumer.go
Original file line number Diff line number Diff line change
Expand Up @@ -524,10 +524,20 @@ func (child *partitionConsumer) parseRecords(batch *RecordBatch) ([]*ConsumerMes
var messages []*ConsumerMessage
var incomplete bool
prelude := true
originalOffset := child.offset

// Compacted records can lie about their offset delta, but that can be
// detected comparing the last offset delta from the record batch with
// actual offset delta of the last record. The difference (skew) should be
// applied to all records in the batch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is true? Isn't the calculated skew only guaranteed to be true for the final record in the batch? Depending on where the records got compacted out the skew could be arbitrarily distributed in the batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This is not correct. The problem is most likely with v0.11 producer. Please take a look at #1040 for more details. I am closing this PR.

var offsetSkew int64
recordCount := len(batch.Records)
if recordCount > 0 {
lastRecordOffsetDelta := batch.Records[recordCount-1].OffsetDelta
offsetSkew = int64(batch.LastOffsetDelta) - lastRecordOffsetDelta
}

for _, rec := range batch.Records {
offset := batch.FirstOffset + rec.OffsetDelta
offset := batch.FirstOffset + rec.OffsetDelta + offsetSkew
if prelude && offset < child.offset {
continue
}
Expand All @@ -552,12 +562,6 @@ func (child *partitionConsumer) parseRecords(batch *RecordBatch) ([]*ConsumerMes
if incomplete {
return nil, ErrIncompleteResponse
}

child.offset = batch.FirstOffset + int64(batch.LastOffsetDelta) + 1
if child.offset <= originalOffset {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the condition needs to change, but don't we still need to account for some case where the offset doesn't advance? Or is that handled by the incomplete logic now?

edit: the incomplete logic looks like it might be wrong now anyway, I'm not sure

return nil, ErrConsumerOffsetNotAdvanced
}

return messages, nil
}

Expand Down
2 changes: 1 addition & 1 deletion record_batch.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ func (b *RecordBatch) encode(pe packetEncoder) error {
pe.putInt8(b.Version)
pe.push(newCRC32Field(crcCastagnoli))
pe.putInt16(b.computeAttributes())
pe.putInt32(int32(len(b.Records)))
pe.putInt32(b.LastOffsetDelta)

if err := (Timestamp{&b.FirstTimestamp}).encode(pe); err != nil {
return err
Expand Down
34 changes: 19 additions & 15 deletions record_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,10 @@ var recordBatchTestCases = []struct {
{
name: "uncompressed record",
batch: RecordBatch{
Version: 2,
FirstTimestamp: time.Unix(1479847795, 0),
MaxTimestamp: time.Unix(0, 0),
Version: 2,
LastOffsetDelta: 1,
FirstTimestamp: time.Unix(1479847795, 0),
MaxTimestamp: time.Unix(0, 0),
Records: []*Record{{
TimestampDelta: 5 * time.Millisecond,
Key: []byte{1, 2, 3, 4},
Expand Down Expand Up @@ -115,10 +116,11 @@ var recordBatchTestCases = []struct {
{
name: "gzipped record",
batch: RecordBatch{
Version: 2,
Codec: CompressionGZIP,
FirstTimestamp: time.Unix(1479847795, 0),
MaxTimestamp: time.Unix(0, 0),
Version: 2,
Codec: CompressionGZIP,
LastOffsetDelta: 1,
FirstTimestamp: time.Unix(1479847795, 0),
MaxTimestamp: time.Unix(0, 0),
Records: []*Record{{
TimestampDelta: 5 * time.Millisecond,
Key: []byte{1, 2, 3, 4},
Expand Down Expand Up @@ -168,10 +170,11 @@ var recordBatchTestCases = []struct {
{
name: "snappy compressed record",
batch: RecordBatch{
Version: 2,
Codec: CompressionSnappy,
FirstTimestamp: time.Unix(1479847795, 0),
MaxTimestamp: time.Unix(0, 0),
Version: 2,
Codec: CompressionSnappy,
LastOffsetDelta: 1,
FirstTimestamp: time.Unix(1479847795, 0),
MaxTimestamp: time.Unix(0, 0),
Records: []*Record{{
TimestampDelta: 5 * time.Millisecond,
Key: []byte{1, 2, 3, 4},
Expand Down Expand Up @@ -203,10 +206,11 @@ var recordBatchTestCases = []struct {
{
name: "lz4 compressed record",
batch: RecordBatch{
Version: 2,
Codec: CompressionLZ4,
FirstTimestamp: time.Unix(1479847795, 0),
MaxTimestamp: time.Unix(0, 0),
Version: 2,
Codec: CompressionLZ4,
LastOffsetDelta: 1,
FirstTimestamp: time.Unix(1479847795, 0),
MaxTimestamp: time.Unix(0, 0),
Records: []*Record{{
TimestampDelta: 5 * time.Millisecond,
Key: []byte{1, 2, 3, 4},
Expand Down