pkg/util/metric: remove HdrHistogram once Prometheus histograms proven in production #96357
Labels
A-observability-inf
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-observability
In #95833 it was revealed that the newer Prometheus histogram model was causing issues in its calculation of quantiles, especially in histograms measuring latency.
While this is cause by the histogram buckets used in these Prometheus histograms, which we fixed in #96029, we also introduced an environment variable to allow users to revert to using the older HdrHistogram model in the event that this issue arises again. This was done to avoid finding ourselves in a situation yet again where customers are unable to effectively use the new Prometheus-backed histogram metrics, and we have no immediate mitigation available. Because of this, we re-introduced the HdrHistogram code back into CRDB, only to be used if the environment variable was enabled.
Once the new histogram bucket boundaries have been proven in production, we should rip out all of the HdrHistogram code once and for all to complete the migration to the Prometheus histogram model.
Jira issue: CRDB-24084
The text was updated successfully, but these errors were encountered: