You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 17, 2024. It is now read-only.
It is convenient to produce a sum of the lag per topic/group. This is briefly mentioned in #72.
In my case we have clusters with many hundreds of consumer groups. While we scrape the granular data for our Prometheus instances, we also run a side-car DataDog agent to collect metrics from exporters and push important telemetry into that system. There is extra cost associated with the cardinality of kafka_consumergroup_group_lag, so having this rolled-up at the source is convenient.
Proposing:
kafka_consumergroup_group_total_lag
Labels: cluster_name, group, topic
The text was updated successfully, but these errors were encountered:
Yes, I agree that the sum of lag being more useful than the max lag for monitoring standard operations of a streaming platform to see how far you are in aggregate. Max lag is good for spotting hot partitions quickly. Using a max is compatible with the lag in seconds estimate too, but a sum wouldn't make sense.
Is this something you would be interested in contributing?
It is convenient to produce a sum of the lag per topic/group. This is briefly mentioned in #72.
In my case we have clusters with many hundreds of consumer groups. While we scrape the granular data for our Prometheus instances, we also run a side-car DataDog agent to collect metrics from exporters and push important telemetry into that system. There is extra cost associated with the cardinality of
kafka_consumergroup_group_lag
, so having this rolled-up at the source is convenient.Proposing:
kafka_consumergroup_group_total_lag
Labels:
cluster_name, group, topic
The text was updated successfully, but these errors were encountered: