Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission: metrics and tests for evaluating improvements #85469

Closed
sumeerbhola opened this issue Aug 2, 2022 · 2 comments
Closed

admission: metrics and tests for evaluating improvements #85469

sumeerbhola opened this issue Aug 2, 2022 · 2 comments
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Aug 2, 2022

Many of the improvements to admission control in the past have fixed an obvious gap e.g. lack of awareness of a bottleneck resource, incorrect accounting for how many resources are consumed etc.
As we evolve admission control, there are improvements that have the risk to cause regressions in peak throughput, isolation (both throughput and latency). Our current approach to run custom experiments for each PR and post results (graphs) with reproduction steps is not sufficient going forward.

We need a set of tests that setup particular cluster configurations, and one or multiple workloads. Most interesting behavior in admission control requires multiple workloads, of differing importance (e.g. regular sql reads/writes and bulk work of various kinds), or different tenants. With the introduction of elastic work, even a single workload can be affected in terms of throughput, since we may leave resources under-utilized to avoid risking higher latency for regular work. We also need summary metrics for the test run, so we don't have to examine graphs: e.g. throughput, throughput variance, latency percentiles etc. for each workload.

This is an umbrella issue that we can close after we've developed an initial set of tests that we think give reasonable coverage for the current admission control implementation.

@irfansharif @tbg

Jira issue: CRDB-18257

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-admission-control labels Aug 2, 2022
@andrewbaptist
Copy link
Collaborator

One idea for this (what I had done at a previous company) is to create a matrix of "workload x condition" and measure the performance (*) at each of the combinations.

Workloads would be things like

  • TPC-C
  • TPC-H
  • YCSB (some combination of A, B, C, F)
  • (Note I don't know enough about the workloads to know if this is enough or we need others)

The condition would be things like

  • 9 node, all healthy, single site
  • 9 node, all healthy, 3 regions with 50ms latency between them
  • 8 node healthy, 1 node slow disk (decide if single or multi-site)
  • 8 node healthy, 1 node decommissioning

The "sweet spot" for this is probably 5 workloads, and 5 conditions, so 25 tests total. Each test is run twice, once to determine "max throughput" and once to determine "P99 latency". The max throughput version simply runs the workload as fast as possible and an outcome is a single number of "test run time". The "P99 latency" runs at 50-75% of "today's best throughput number" and records the median P99 latency across the test run.

In total this give 50 performance outcome numbers which can then be shown on a "matrix bingo board" with the two old numbers and two new numbers. For a given change, it is unlikely (especially once low hanging fixes are done) that all 50 metrics will be positive. Coloring the matrix with the following:

"Either improved >25%" - Dark green
"Both improved 5-25%" - Light green
"Both within 25%, but one increased >5% or decreased >5%" - Yellow
"Both within 5%" - White
"Both decreased 5-25%" - Light red
"Either decreased >25%" - Dark red

And doing this consistently for features allows a very easy way to visualize the impact of a change and what needs to be done to investigate further.

A big caveat with this testing is the stability of this run-over-run. If the same build is run twice in a row, then the comparison needs to be 100% white (no changes > 5%, and ideally <2%), or this becomes a less useful test. Anything that can be done to maximize that is important. Additionally, the 50 tests should run in ~8 hours (definitely <24 hours) to allow this to be done rapidly enough for developers to iterate on.

To accomplish this requires a few things

  • Exact definition of the workloads and conditions
  • Generation of a script that will run all the tests and generate a "report" (maybe this is stored into a database?)
  • Visualization to compare 2 reports and generate the colored report
  • Building of baselines and determining the variance across about 10 runs today

This would be a great project to do in the stabilization period after 8/15

@irfansharif
Copy link
Contributor

I forgot about this issue but have since cargo-culted a bunch of text in #89208 instead. I'm also referencing that in internal milestones as such, and noted a few targeted reproductions we want to flesh out first. I've also alluded to library components we'd want under pkg/roachtest to make our lives easier.

One idea for this (what I had done at a previous company) is to create a matrix of "workload x condition" and measure the performance (*) at each of the combinations.

I agree it's a good idea, but I'm a bit wary of the matrix-style approach to start with given the $$$ spend + support burden -- getting pinged by the team city bot on NxM variants of the same regression, to be dealt with with by a team of 2-3 people, feels like a lot. Let's try the targeted form first to see what we learn and come back to this matrix thing after. I imagine it would also need some support from @cockroachdb/test-eng folks.

@irfansharif irfansharif closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

3 participants