Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect metrics during startup when using -s flag #19

Open
jutley opened this issue Sep 21, 2018 · 1 comment
Open

Incorrect metrics during startup when using -s flag #19

jutley opened this issue Sep 21, 2018 · 1 comment

Comments

@jutley
Copy link
Contributor

jutley commented Sep 21, 2018

When using the flag to start from the beginning of the __consumer_offsets topic, metrics reported are based on old values until the exporter has reached the most recent commits. This has caused confusion for developers in our org, as it looks like their consumers are suddenly lagging.

I'd like to propose that when reading from the beginning of __consumer_offsets, consumer_group metrics do not get reported until ONE OF the following conditions is true for ALL __consumer_offsets partitions:

  • The lag for the exporter has reached 0
  • The exporter has read a commit with a timestamp later than the exporter's start time

I'd also like to propose a health endpoint that provides the status of this warmup phase. This can help us make sure we only tear down an old container once a new container is providing the correct metrics.

These changes would favor less information over inaccurate information, which I think is beneficial in almost all cases.

If you agree this is a good direction, I'd be happy to take a stab at implementing it.

@braedon
Copy link
Owner

braedon commented Mar 30, 2019

Hi @jutley, happy to review a PR around this.

I'd suggest checking if the exporter has consumed up to the high water mark observed during startup - timestamps can be finicky (particularly if a consumer group isn't actively committing), and while the exporter needs to be able to "keep up", it never strictly needs to reach a lag of 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants