-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add {Gauge,Summary,Counter}Definitions to PrometheusOpts to define non-expiring metrics #120
Conversation
…nter,Gauge,Summary}Definition types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will go ahead and approve; one question about initializing constant gauges and summaries.
} | ||
|
||
type PrometheusSummary struct { | ||
type summary struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these classes were exported before (even though they didn't need to be), but i don't find any indication that they're used anywhere, so 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we looked over it and it seemed pretty safe to unexport these because they're only used internally.
prometheus/prometheus.go
Outdated
Help: g.Help, | ||
ConstLabels: prometheusLabels(g.ConstLabels), | ||
}) | ||
pG.Set(float64(0)) // Initialize at zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should gauges and summaries (which haven't been observed yet) be initialized to zero or NaN
(as they do after expiration)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question - the initialization part here is a little redundant for counters and gauges because they start at zero in the go prometheus lib anyway. However, initializing summaries to 0 does create an artifact when the process starts and before they're emitted, where the line sits on the 0 like it's a counter or gauge. It's going to be better to delete all of these inits and just use the go prometheus client's defaults.
This PR introduces structs for Gauge, Summary, and Counter Definitions that can be provided to the
PrometheusSink
via itsPrometheusOpts
. ThePrometheusSink
will uses these definition slices to declare metrics when the process starts, and marks them internally to not be deleted when their last-updated time passes the configured expiry. Instead, ifcanDelete
isfalse
, the PrometheusSink expires the metric by setting its value to NaN. When metrics aren't defined, they should behave as they always have before this PR.Allowing go-metrics users to define static metrics in prometheus lets us resolve a long-requested pain point with how Consul expires metrics it has not observed within its prometheus_retention_time, which we set PrometheusSink's expiry value with. Workarounds for this have led to users configuring retention times on the order of days, weeks, months, or longer, leading to a pathological loss of precision in metrics which do not update often. Not all of these metrics remain valid even if we haven't updated them. With this change we can recommend much shorter retention times which will lead to more accurate measurements and fewer bogus stats.
As a secondary benefit, we can provide prometheus users better UX by attaching help string documentation to the metrics, as prometheus users expect.
This is a successor PR to the approach in #119 and allows ephemeral metrics in HashiCorp products to continue to exist where short-lived metrics would otherwise stop products from adopting the previous NaNExpiry behavior wholesale. The definitions also provide a similar "define your metrics before you use them" approach that's used in Prometheus clients.
More research needs to be done on how to best handle definitions for sub-libraries using go-metrics which expect their configuration from their hosting application. One path offered by this PR is to declare definitions inside the library and have the hosting application "reach into" the lib and append its definitions to the application's when it creates its
PrometheusSink
. A alternative approach has been proposed which is compatible with this definitions PR, and was one of the early iterations on this bug fix: allow prometheus metrics to be "defined" at runtime with additional methods for each metric type in go-metrics. For one example,InitCounterWithLabels
.