Reset Prometheus request tags on conf reload #2728
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
On every backend configuration reload, Prometheus collectors for
request metrics (bearing the
upstream_ip
tag) are being recreated toavoid keeping on exposing metrics for dead upstreams on the
/metrics
endpoint.
Also, we changed the structure of the collector.go file into a
singleton (to make it possible for the ingress controller to ask for a
Prometheus collector reset on these metrics on config. reloads).
Why we need it:
When an upstream pod dies (because of a scale down operation, a rolling
upgrade...), metrics associated with this given pod via the
upstream_ip
tag remain exposed on the
/metrics
endpoints. This cause threeundesireable effects:
queries easily tend to time out or frontends tend too freeze
because too many timeseries are returned
time on production clusters, which also make them more expensive to
ingest and store
memory leak.
Which issue this PR fixes:
I didn't open an issue for that and filed a PR directly, but that can be done if required.
Special notes for your reviewer:
If the "singleton" design pattern bothers you (because you were intending to instantiate multiple collectors in the future for some reason), or anything else. I'll be happy to promptly refactor the PR.
What could we break:
On infrastructure where the config would be reloaded at a high pace (higher than the Prometheus
scrape_interval
, we introduce discontinuities in our timeseries. It's something that Prometheusrate()
(and similar) function handle very well. However, if backend reconfiguration frequency is higher that the Prometheusscrape_interval
, it's possible that we get incoherent.If so, we may just rate-limit the metric flushing operation to 1 every "user-configurable amount of time" (ex. 5 minutes). As long as this time is superior to a couple of Prometheus
scrape_interval
s, we should be totally fine :)PR Status:
The PR compiles, and his being tested on our staging infrastructure. So far so good :)
I'll try it on our production clusters today and compare the size of the payloads of the
/metrics
endpoint before and after the upgrade and also check that discontinuities during intense scale-in operations