-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metric: improved testing for histogram latency buckets #97144
Labels
A-observability-inf
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-observability
Comments
aadityasondhi
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-observability-inf
A-observability-inf
labels
Feb 14, 2023
ericharmeling
added a commit
to ericharmeling/cockroach
that referenced
this issue
Jul 24, 2023
This commit refactors histogram bucketing for legibility and composibility. It also introduces a data-driven test for histogram bucket generation. This refactor should make it easier to add additional metric categories, distributions, and bucket types. Part of cockroachdb#97144. Release note: None
ericharmeling
added a commit
to ericharmeling/cockroach
that referenced
this issue
Aug 1, 2023
This commit refactors histogram bucketing for legibility and composibility. It also introduces a data-driven test for histogram bucket generation. This refactor should make it easier to add additional metric categories, distributions, and bucket types. Part of cockroachdb#97144. Release note: None
ericharmeling
added a commit
to ericharmeling/cockroach
that referenced
this issue
Aug 9, 2023
This commit refactors histogram bucketing for legibility and composibility. It also introduces a data-driven test for histogram bucket generation. This refactor should make it easier to add additional metric categories, distributions, and bucket types. Part of cockroachdb#97144. Release note: None
ericharmeling
added a commit
to ericharmeling/cockroach
that referenced
this issue
Aug 10, 2023
This commit refactors histogram bucketing for legibility and composibility. It also introduces a data-driven test for histogram bucket generation. This refactor should make it easier to add additional metric categories, distributions, and bucket types. Part of cockroachdb#97144. Release note: None
craig bot
pushed a commit
that referenced
this issue
Aug 10, 2023
107388: metrics: refactor histogram bucket generation and testing r=ericharmeling a=ericharmeling This commit refactors histogram bucketing for legibility and composibility. It also introduces a data-driven test for histogram bucket generation. This refactor should make it easier to add additional metric categories, distributions, and bucket types. Part of #97144. Release note: None 108415: roachtest: deflake npgsql test r=rafiss a=rafiss These upstream tests are flaky, so we ignore them. informs #108414 fixes #108044 fixes #108504 Release note: None 108535: roachtest: remove rangeTs variants of import-cancellation test r=stevendanna a=jbowens Previously the import-cancellation roachtest was split into two variants: one with MVCC range tombstones enabled and one without. MVCC range tombstones are always enabled now, so the two variants were effectively identical. This commit consolidates the two tests into a single `import-cancellation` roachtest. Informs #97869. Epic: None Release note: None Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
ericharmeling
added a commit
to ericharmeling/cockroach
that referenced
this issue
Aug 29, 2023
This commit adds a new roachtest: histogram-buckets. This roachtest sets a new environment variable that forces a uniform distribution on all histogram buckets. With a uniform distribution, we can construct a sample of values for each metric without introducing a new metric to record observations. From this population sample, we can run goodness-of-fit tests for some common probability distributions: normal, lognormal, exponential, and uniform. This test should help us determine the appropriate bucket generation algorithm for each metric. Part of cockroachdb#97144.
jmcarp
pushed a commit
to jmcarp/cockroach
that referenced
this issue
Aug 31, 2023
This commit refactors histogram bucketing for legibility and composibility. It also introduces a data-driven test for histogram bucket generation. This refactor should make it easier to add additional metric categories, distributions, and bucket types. Part of cockroachdb#97144. Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-observability-inf
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-observability
Is your feature request related to a problem? Please describe.
Recently we had an incident where the lack of resolution in the histogram quantiles time series caused us to report inaccurate latency data.
Related Issue: #95833
Postmortem: https://cockroachlabs.atlassian.net/wiki/spaces/OI/pages/2884632706/2023-01-27+Postmortem+on+Low+Resolution+Metric+Histograms+Impact+Latency+Observability
Describe the solution you'd like
Some sort of test that looks at latencies reported by crdb histograms and compares it to the actual latency when running a workload to ensure it is under an acceptable threshold (TBD what that threshold is). This will help catch inappropriately designed histogram buckets for a given metric since they are statically defined in https://github.com/cockroachdb/cockroach/blob/master/pkg/util/metric/histogram_buckets.go.
During the postmortem, it was suggested to try and incorporate this as part of roachperf or roachtest as they are run regularly and would have caught this problem earlier.
Jira issue: CRDB-24540
Epic CRDB-20790
The text was updated successfully, but these errors were encountered: