Skip to content
This repository has been archived by the owner on Mar 17, 2024. It is now read-only.

Add metric to represent a consumer group's total offset lag per topic #93

Merged
merged 2 commits into from
Nov 4, 2019

Conversation

dylanmei
Copy link
Contributor

In regards to #92, add a metric to represent a consumer group's total offset lag per topic.


New metric:

kafka_consumergroup_group_topic_lag

Labels: cluster_name, group, topic

The sum of the difference between the last produced offset and the last consumed offset of all partitions in this topic for this group.


I'm on fence about the metric name itself. Some things that came to mind:

  • kafka_consumergroup_group_topic_lag (current proposal)
  • kafka_consumergroup_group_total_lag (original proposal)
  • kafka_consumergroup_topic_lag

@lightbend-cla-validator

Hi @dylanmei,

Thank you for your contribution! We really value the time you've taken to put this together.

Before we proceed with reviewing this pull request, please sign the Lightbend Contributors License Agreement:

http://www.lightbend.com/contribute/cla

Copy link
Owner

@seglo seglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I see the code is grouping by consumergroup topic to take the sum. In cases where a consumer group is subscribed to more than 1 topic that means you could have multiple total lag metrics per group. I can see how it would make sense to report total lag for a particular group topic, but it would also be logical to sum lag for all partitions across all topics. WDYT?

README.md Outdated Show resolved Hide resolved
@dylanmei
Copy link
Contributor Author

I can see how it would make sense to report total lag for a particular group topic, but it would also be logical to sum lag for all partitions across all topics.

My initial motivation was to provide a low-cardinality roll-up of the partition lag, especially as values such as client_id and member_host can in unfortunate cases change many times a second.

At some point aggregations are better handled at query time, and I think "logical to sum lag for all partitions across all topics" may be one such case.

However, it's non-trivial to add in if today if you feel strongly about it.

@seglo
Copy link
Owner

seglo commented Oct 21, 2019

My initial motivation was to provide a low-cardinality roll-up of the partition lag, especially as values such as client_id and member_host can in unfortunate cases change many times a second.

At some point aggregations are better handled at query time, and I think "logical to sum lag for all partitions across all topics" may be one such case.

I think it would still satisfy your low-cardinality use case. It would be a more simple implementation as well because it would follow the same logic as the max lag impl. at the "group" level, instead of adding a new kind of cardinality that's in the PR: "group" and "topic"

Ex) My consumer group has a subscription of topic-1 and topic-2. sum lag would be the offset delta of all partitions belonging to both topics.

@dylanmei dylanmei force-pushed the consumer_group_lag_by_topic branch 2 times, most recently from 1c5436e to f233634 Compare October 22, 2019 14:28
@dylanmei
Copy link
Contributor Author

Updating after initial feedback.


New metrics:

kafka_consumergroup_group_sum_lag

Labels: cluster_name, group

The sum of the difference between the last produced offset and the last consumed offset of all partitions for this group.

kafka_consumergroup_group_topic_lag

Labels: cluster_name, group, topic

The sum of the difference between the last produced offset and the last consumed offset of all partitions in this topic for this group.


I'm still insecure about the name of this second metric: kafka_consumergroup_group_topic_lag

To follow the "including the aggregation type in the metric name" guidance @seglo has given, it could also be kafka_consumergroup_group_topic_sum_lag or kafka_consumergroup_topic_sum_lag.

@seglo
Copy link
Owner

seglo commented Nov 2, 2019

I'm still insecure about the name of this second metric: kafka_consumergroup_group_topic_lag

To follow the "including the aggregation type in the metric name" guidance @seglo has given, it could also be kafka_consumergroup_group_topic_sum_lag or kafka_consumergroup_topic_sum_lag.

Looking good @dylanmei ! Let's call it kafka_consumergroup_group_topic_sum_lag.

@dylanmei dylanmei force-pushed the consumer_group_lag_by_topic branch from f233634 to 279b273 Compare November 3, 2019 14:42
Copy link
Owner

@seglo seglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution!

@seglo seglo merged commit de1bb6c into seglo:master Nov 4, 2019
@seglo seglo added this to the 0.5.5 milestone Nov 11, 2019
anbarasantr pushed a commit to anbarasantr/kafka-lag-exporter that referenced this pull request Nov 24, 2019
…seglo#93)

* Add metric of a consumer group's total offset lag and total lag per topic
* Update name of group/topic aggregate metric
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants