Lag is incorrectly calculated in 0.11.5 #1911

andreycha · 2018-08-01T10:09:41Z

Description

We're using Confluent Kafka driver for .NET and after we upgraded to 0.11.5, we started to get incorrect statistics. Since it comes from librdkafka, I opened the issue here. The problem is that lag is now calculated incorrectly, if some of the offsets are set to negative values, here is the example (look at the last two entries):

Partition: 0, QueryOffset: -2, NextOffset: 29, AppOffset: -1001, StoredOffset: -1001, CommittedOffset: -1001, EofOffset: 29, LowestOffset: 29, HighestOffset: 29, Lag: 1030
Partition: 1, QueryOffset: -2, NextOffset: 50, AppOffset: -1001, StoredOffset: -1001, CommittedOffset: -1001, EofOffset: 50, LowestOffset: 50, HighestOffset: 50, Lag: 1051
Partition: 2, QueryOffset: -2, NextOffset: 27, AppOffset: -1001, StoredOffset: -1001, CommittedOffset: -1001, EofOffset: 27, LowestOffset: 27, HighestOffset: 27, Lag: 1028
Partition: 3, QueryOffset: -2, NextOffset: 24, AppOffset: -1001, StoredOffset: -1001, CommittedOffset: -1001, EofOffset: 24, LowestOffset: 24, HighestOffset: 24, Lag: 1025
Partition: 4, QueryOffset: -2, NextOffset: 26, AppOffset: -1001, StoredOffset: -1001, CommittedOffset: -1001, EofOffset: 26, LowestOffset: 26, HighestOffset: 26, Lag: 1027
Partition: 5, QueryOffset: -2, NextOffset: 38, AppOffset: -1001, StoredOffset: -1001, CommittedOffset: -1001, EofOffset: 38, LowestOffset: 38, HighestOffset: 38, Lag: 1039
Partition: 6, QueryOffset: -1001, NextOffset: 45, AppOffset: 45, StoredOffset: 45, CommittedOffset: 44, EofOffset: 45, LowestOffset: 44, HighestOffset: 45, Lag: 0
Partition: 7, QueryOffset: -2, NextOffset: 29, AppOffset: -1001, StoredOffset: -1001, CommittedOffset: -1001, EofOffset: 29, LowestOffset: 29, HighestOffset: 29, Lag: 1030

As far as I can see, the issue comes from d41b086. Lag is now calculated as hi_offset - max(app_offset, commit_offset). The math should take into account situations where partitions are not consumed or messages are not committed and both app_offset/commit_offset are negative. Probably in this case max(lo_offset, 0) should be used for lag calculation.

Also it's not clear for the case AppOffset: 45, CommittedOffset: 44, HighestOffset: 45, Lag: 0 why lag is 0. Shouldn't it be 1? Docs say that app offset is "Offset of last message passed to application + 1", so it means that application has already processed offset 44, but not yet processed offset 45.

How to reproduce

Start consuming any topic with autocommit off.

Checklist

librdkafka version (release number or git tag): 0.11.5
Apache Kafka version:
librdkafka client configuration: autocommit is off
Operating system: Win 10 x64
Provide logs (with debug=.. as necessary) from librdkafka
Provide broker log excerpts
Critical issue

The text was updated successfully, but these errors were encountered:

mhowlett · 2018-08-01T22:16:10Z

for context, the associated PR is #1878 (with relevant discussion).

yes, it does appear as though calculations are now incorrect in the case of special offsets. thanks for reporting.

edenhill · 2018-08-06T08:30:59Z

So there are two different issues here:

consumer_lag does not take invalid/unset app_offset/committed_offset into calculation.
the consumer_lag is off by one due to app_offset and committed_offset being +1.

2 is straight forward.
For 1, if both app_offset and committed_offset are invalid it either means there are no messages to consume, or no messages have yet been consumed. I think it might be better to let consumer_lag be -1 in this case to indicate that it is infact unknown.

edenhill · 2018-08-23T19:47:20Z

The high watermark offset is the next offset of the partition, i.e., latest message offset + 1.
This value is the same as the committed offset for a caught-up consumer.
If 10 messages are produced, giving offsets 0..9, the highwatermark will be 10, and after consumption the app and committed offsets will also be 10, thus a lag of 0.

…s available (#1911)

edenhill · 2018-08-25T10:44:08Z

Fixed on master

mhowlett added the bug label Aug 1, 2018

edenhill added a commit that referenced this issue Aug 23, 2018

Fix consumer_lag to -1 when neither app_offset or commmitted_offset i…

9528418

…s available (#1911)

edenhill mentioned this issue Aug 23, 2018

Fix consumer_lag to -1 when neither app_offset or commmitted_offset i… #1967

Merged

edenhill added a commit that referenced this issue Aug 25, 2018

Fix consumer_lag to -1 when neither app_offset or commmitted_offset i…

bc7c68a

…s available (#1911)

edenhill closed this as completed Aug 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lag is incorrectly calculated in 0.11.5 #1911

Lag is incorrectly calculated in 0.11.5 #1911

andreycha commented Aug 1, 2018 •

edited

Loading

mhowlett commented Aug 1, 2018

edenhill commented Aug 6, 2018

edenhill commented Aug 23, 2018

edenhill commented Aug 25, 2018

Lag is incorrectly calculated in 0.11.5 #1911

Lag is incorrectly calculated in 0.11.5 #1911

Comments

andreycha commented Aug 1, 2018 • edited Loading

Description

How to reproduce

Checklist

mhowlett commented Aug 1, 2018

edenhill commented Aug 6, 2018

edenhill commented Aug 23, 2018

edenhill commented Aug 25, 2018

andreycha commented Aug 1, 2018 •

edited

Loading