Add a sum of lag per consumer group #92

dylanmei · 2019-10-18T16:04:18Z

It is convenient to produce a sum of the lag per topic/group. This is briefly mentioned in #72.

In my case we have clusters with many hundreds of consumer groups. While we scrape the granular data for our Prometheus instances, we also run a side-car DataDog agent to collect metrics from exporters and push important telemetry into that system. There is extra cost associated with the cardinality of kafka_consumergroup_group_lag, so having this rolled-up at the source is convenient.

Proposing:

kafka_consumergroup_group_total_lag

Labels: cluster_name, group, topic

The text was updated successfully, but these errors were encountered:

seglo · 2019-10-18T16:12:20Z

Yes, I agree that the sum of lag being more useful than the max lag for monitoring standard operations of a streaming platform to see how far you are in aggregate. Max lag is good for spotting hot partitions quickly. Using a max is compatible with the lag in seconds estimate too, but a sum wouldn't make sense.

Is this something you would be interested in contributing?

dylanmei · 2019-11-11T03:09:45Z

Closed via #93

dylanmei mentioned this issue Oct 19, 2019

Add metric to represent a consumer group's total offset lag per topic #93

Merged

dylanmei closed this as completed Nov 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a sum of lag per consumer group #92

Add a sum of lag per consumer group #92

dylanmei commented Oct 18, 2019

seglo commented Oct 18, 2019

dylanmei commented Nov 11, 2019

Add a sum of lag per consumer group #92

Add a sum of lag per consumer group #92

Comments

dylanmei commented Oct 18, 2019

seglo commented Oct 18, 2019

dylanmei commented Nov 11, 2019