Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric: improved testing for histogram latency buckets #97144

Open
aadityasondhi opened this issue Feb 14, 2023 · 0 comments
Open

metric: improved testing for histogram latency buckets #97144

aadityasondhi opened this issue Feb 14, 2023 · 0 comments
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability

Comments

@aadityasondhi
Copy link
Collaborator

aadityasondhi commented Feb 14, 2023

Is your feature request related to a problem? Please describe.
Recently we had an incident where the lack of resolution in the histogram quantiles time series caused us to report inaccurate latency data.

Related Issue: #95833
Postmortem: https://cockroachlabs.atlassian.net/wiki/spaces/OI/pages/2884632706/2023-01-27+Postmortem+on+Low+Resolution+Metric+Histograms+Impact+Latency+Observability

Describe the solution you'd like
Some sort of test that looks at latencies reported by crdb histograms and compares it to the actual latency when running a workload to ensure it is under an acceptable threshold (TBD what that threshold is). This will help catch inappropriately designed histogram buckets for a given metric since they are statically defined in https://github.com/cockroachdb/cockroach/blob/master/pkg/util/metric/histogram_buckets.go.

During the postmortem, it was suggested to try and incorporate this as part of roachperf or roachtest as they are run regularly and would have caught this problem earlier.

Jira issue: CRDB-24540

Epic CRDB-20790

@aadityasondhi aadityasondhi added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability-inf A-observability-inf labels Feb 14, 2023
@aadityasondhi aadityasondhi self-assigned this Mar 21, 2023
ericharmeling added a commit to ericharmeling/cockroach that referenced this issue Jul 24, 2023
This commit refactors histogram bucketing for legibility
and composibility. It also introduces a data-driven test
for histogram bucket generation.

This refactor should make it easier to add additional
metric categories, distributions, and bucket types.

Part of cockroachdb#97144.

Release note: None
ericharmeling added a commit to ericharmeling/cockroach that referenced this issue Aug 1, 2023
This commit refactors histogram bucketing for legibility
and composibility. It also introduces a data-driven test
for histogram bucket generation.

This refactor should make it easier to add additional
metric categories, distributions, and bucket types.

Part of cockroachdb#97144.

Release note: None
ericharmeling added a commit to ericharmeling/cockroach that referenced this issue Aug 9, 2023
This commit refactors histogram bucketing for legibility
and composibility. It also introduces a data-driven test
for histogram bucket generation.

This refactor should make it easier to add additional
metric categories, distributions, and bucket types.

Part of cockroachdb#97144.

Release note: None
ericharmeling added a commit to ericharmeling/cockroach that referenced this issue Aug 10, 2023
This commit refactors histogram bucketing for legibility
and composibility. It also introduces a data-driven test
for histogram bucket generation.

This refactor should make it easier to add additional
metric categories, distributions, and bucket types.

Part of cockroachdb#97144.

Release note: None
craig bot pushed a commit that referenced this issue Aug 10, 2023
107388: metrics: refactor histogram bucket generation and testing r=ericharmeling a=ericharmeling

This commit refactors histogram bucketing for legibility and composibility. It also introduces a data-driven test for histogram bucket generation.

This refactor should make it easier to add additional metric categories, distributions, and bucket types.

Part of #97144.

Release note: None

108415: roachtest: deflake npgsql test r=rafiss a=rafiss

These upstream tests are flaky, so we ignore them.

informs #108414
fixes #108044
fixes #108504
Release note: None

108535: roachtest: remove rangeTs variants of import-cancellation test r=stevendanna a=jbowens

Previously the import-cancellation roachtest was split into two variants: one with MVCC range tombstones enabled and one without. MVCC range tombstones are always enabled now, so the two variants were effectively identical. This commit consolidates the two tests into a single `import-cancellation` roachtest.

Informs #97869.
Epic: None
Release note: None

Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
ericharmeling added a commit to ericharmeling/cockroach that referenced this issue Aug 29, 2023
This commit adds a new roachtest: histogram-buckets.

This roachtest sets a new environment variable that
forces a uniform distribution on all histogram buckets.
With a uniform distribution, we can construct a sample
of values for each metric without introducing a new
metric to record observations.

From this population sample, we can run goodness-of-fit
tests for some common probability distributions: normal,
lognormal, exponential, and uniform.

This test should help us determine the appropriate
bucket generation algorithm for each metric.

Part of cockroachdb#97144.
jmcarp pushed a commit to jmcarp/cockroach that referenced this issue Aug 31, 2023
This commit refactors histogram bucketing for legibility
and composibility. It also introduces a data-driven test
for histogram bucket generation.

This refactor should make it easier to add additional
metric categories, distributions, and bucket types.

Part of cockroachdb#97144.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability
Projects
None yet
Development

No branches or pull requests

2 participants