Skip to content
This repository has been archived by the owner on Mar 17, 2024. It is now read-only.

Add a sum of lag per consumer group #92

Closed
dylanmei opened this issue Oct 18, 2019 · 2 comments
Closed

Add a sum of lag per consumer group #92

dylanmei opened this issue Oct 18, 2019 · 2 comments

Comments

@dylanmei
Copy link
Contributor

It is convenient to produce a sum of the lag per topic/group. This is briefly mentioned in #72.

In my case we have clusters with many hundreds of consumer groups. While we scrape the granular data for our Prometheus instances, we also run a side-car DataDog agent to collect metrics from exporters and push important telemetry into that system. There is extra cost associated with the cardinality of kafka_consumergroup_group_lag, so having this rolled-up at the source is convenient.

Proposing:

kafka_consumergroup_group_total_lag

Labels: cluster_name, group, topic

@seglo
Copy link
Owner

seglo commented Oct 18, 2019

Yes, I agree that the sum of lag being more useful than the max lag for monitoring standard operations of a streaming platform to see how far you are in aggregate. Max lag is good for spotting hot partitions quickly. Using a max is compatible with the lag in seconds estimate too, but a sum wouldn't make sense.

Is this something you would be interested in contributing?

@dylanmei
Copy link
Contributor Author

Closed via #93

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants