You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the flag to start from the beginning of the __consumer_offsets topic, metrics reported are based on old values until the exporter has reached the most recent commits. This has caused confusion for developers in our org, as it looks like their consumers are suddenly lagging.
I'd like to propose that when reading from the beginning of __consumer_offsets, consumer_group metrics do not get reported until ONE OF the following conditions is true for ALL __consumer_offsets partitions:
The lag for the exporter has reached 0
The exporter has read a commit with a timestamp later than the exporter's start time
I'd also like to propose a health endpoint that provides the status of this warmup phase. This can help us make sure we only tear down an old container once a new container is providing the correct metrics.
These changes would favor less information over inaccurate information, which I think is beneficial in almost all cases.
If you agree this is a good direction, I'd be happy to take a stab at implementing it.
The text was updated successfully, but these errors were encountered:
I'd suggest checking if the exporter has consumed up to the high water mark observed during startup - timestamps can be finicky (particularly if a consumer group isn't actively committing), and while the exporter needs to be able to "keep up", it never strictly needs to reach a lag of 0.
When using the flag to start from the beginning of the
__consumer_offsets
topic, metrics reported are based on old values until the exporter has reached the most recent commits. This has caused confusion for developers in our org, as it looks like their consumers are suddenly lagging.I'd like to propose that when reading from the beginning of
__consumer_offsets
, consumer_group metrics do not get reported until ONE OF the following conditions is true for ALL__consumer_offsets
partitions:I'd also like to propose a health endpoint that provides the status of this warmup phase. This can help us make sure we only tear down an old container once a new container is providing the correct metrics.
These changes would favor less information over inaccurate information, which I think is beneficial in almost all cases.
If you agree this is a good direction, I'd be happy to take a stab at implementing it.
The text was updated successfully, but these errors were encountered: