-
-
Notifications
You must be signed in to change notification settings - Fork 197
Metrics not updated when a consumer group is not active #63
Comments
@efrikin When you say you stop reading do you mean that as soon as the consumer group shuts down the metrics are no longer reported? As I described #36, we only report data for groups that are returned by the Kafka I can see how this could be a problem in some cases, even though a consumer group is not active any more, you may still want to know how far behind it is regardless. In your use case are the consumer groups shut down intentionally or have they encountered an error? How long would you consider is long enough for a group to be inactive before we shouldn't report lag any more? |
@seglo Thank a lot the answer
Yes of course. When last consumer leaving consumer group, the metrics are no longer reported and I see empty chart (prometheus console outputs no data)
My case is related to a production incident, when consumer group suddenly got stuck and we could not spot the ever-growing lag for this consumer group for several hours.
I think this behavior should be implemented as a custom user configuration, but by default it could equal to 5 minutes. |
@efrikin Thanks for the reply.
I see. I thought there was a longer grace period for the consumer group to stay active after the last member has left.
Yes. We could add a feature that retains group metadata for a configured interval of time after it's left the cache of the Consumer Group coordinator. If the group no longer exists in the next poll we can continue calculating the lag based on the last consumed offsets. If the group becomes active again then we continue as normal, if it doesn't become active then after an interval of time remove it from the metrics endpoint. Perhaps a default of 30 minutes would be a good value to start with. The one caveat is that if Kafka Lag Exporter is started after a group is no longer active, then it won't see it until it's active again. |
@seglo Thanks for the reply.
This is good news.
For values by default 30 minutes a great starting.
Would it be possible to include it in next release? I'll really appreciate that! Also, could you please let me know the date of the next release? Thanks a lot! |
@efrikin Thanks for clarifying. I will issue a release soon. There are several PRs in progress. I'll create a new issue for this one and work on it soon, unless someone else volunteers to do it first. |
@seglo Thank a lot. I'll really appreciate that! |
Just to chime in here. I do agree that lag calculations may not be applicable beyond a certain time if there is no active consumer groups. However, the kafka_partition_*_offset metrics should report regardless of consumer groups being active or not. This metric is not related to a consumer group but more a producer and we use it to ensure that we are getting new messages into the topic. |
The I understand the value you get from monitoring the latest offset of arbitrary partitions. It would require a poll of all topic partitions in a cluster. That could be many more partitions than would be desired, but if it were enabled through a feature flag I think it would be a fine addition. |
Hi @seglo.
I have the same problem => #36
Ex.
I have group consumers which read topic.
When I stop read topic I see on the chart empty data(no data)
When I run consumer group lag and I see data on chart
This is very critical becouse in this is time consumer lag not monitoring
any idea?
Version exporter 0.5.1
The text was updated successfully, but these errors were encountered: