split: Add a testing framework to benchmark load splitter approaches #91636

KaiSun314 · 2022-11-09T22:29:07Z

Informs: #90574

Here, we create a testing framework to benchmark load-based splitters, to enable experimenting and comparison of different designs and settings.

This testing framework inputs:

Generator settings to generate the requests for the load splitter to record

Start key generator type (zipfian or uniform) and iMax
Span length generator type (zipfian or uniform) and iMax
Weight generator type (zipfian or uniform) and iMax

Range request percent (percent of range requests [startKey, endKey) as opposed to point requests with just a start key)
Load-based splitter constructor
Random seed

This testing framework performs the following work:

Generates the requests for the load splitter to record
Calculates the optimal split key and its left / right weights
Constructs the load splitter and invokes the load splitter's Record function on all the generated requests
Invokes the load splitter's Key function to get the split key and calculates its left / right weights
Maintains the times it took to execute the load splitter's Record and Key functions

This testing framework also supports repeating the test multiple times with different random numbers and calculating the average / max percentage differences of left / right weights and execution times. This testing framework also supports running tests with different settings and printing the test results for each setting.

Example of testing framework output:

Release note: None

cockroach-teamcity · 2022-11-09T22:29:15Z

This change is

kvoli

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @KaiSun314)

pkg/kv/kvserver/split/load_based_splitter_test.go line 42 at r1 (raw file):

}

type Config struct {

Do these structs/methods need to be exported?

pkg/kv/kvserver/split/load_based_splitter_test.go line 66 at r1 (raw file):

}

func runTest(

It feels as though there are a lot of separate components coming together here, could we split out generating the sequence of keys, evaluating an oracle and evaluating the real code?

pkg/kv/kvserver/split/load_based_splitter_test.go line 112 at r1 (raw file):

		return weightedKeys[i].key < weightedKeys[j].key
	})
	var optimalKeyPtr *uint32

Could we split this out into an "oracle" function.

pkg/kv/kvserver/split/load_based_splitter_test.go line 115 at r1 (raw file):

	var prefixTotalWeight float32
	for _, weightedKey := range weightedKeys {
		if optimalKeyPtr == nil || math.Abs(float64(totalWeight-2*prefixTotalWeight)) < math.Abs(float64(totalWeight-2*optimalLeftWeight)) {

This probably needs a comment or explanation.

pkg/kv/kvserver/split/load_based_splitter_test.go line 162 at r1 (raw file):

	var err error
	if generatorType == zipfGenerator {
		generator, err = ycsb.NewZipfGenerator(randSource, 1, iMax, 0.99, false)

Is the 0.99 the same as ycsb here?

pkg/kv/kvserver/split/load_based_splitter_test.go line 215 at r1 (raw file):

func runTestMultipleSettings(t *testing.T, settingsArr []Settings) {
	fmt.Printf(

Could we change this to a similar structure to here: https://github.com/kvoli/cockroach/blob/a92f9564748d3ac23ce596542649a8427ce1e3b8/pkg/kv/kvserver/asim/metrics_tracker.go#L36-L45

For clarity.

pkg/kv/kvserver/split/load_based_splitter_test.go line 255 at r1 (raw file):

				return NewTestFinder(randSource)
			},
			seed: 2022,

If we could parameterize some randomness - i.e. get a seed generated randomly and then also run a n iterations where we randomly select the generator and lengths (without some bounds), it may be useful to enable some degree of assertions on the existing split finder - perhaps just that you find a key would be a good start (assuming that parameter boundaries allow for it).

KaiSun314

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)

pkg/kv/kvserver/split/load_based_splitter_test.go line 42 at r1 (raw file):