Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split: Add a testing framework to benchmark load splitter approaches #91636

Merged
merged 1 commit into from
Nov 14, 2022

Conversation

KaiSun314
Copy link
Contributor

@KaiSun314 KaiSun314 commented Nov 9, 2022

Informs: #90574

Here, we create a testing framework to benchmark load-based splitters, to enable experimenting and comparison of different designs and settings.

This testing framework inputs:

  1. Generator settings to generate the requests for the load splitter to record
  • Start key generator type (zipfian or uniform) and iMax
  • Span length generator type (zipfian or uniform) and iMax
  • Weight generator type (zipfian or uniform) and iMax
  1. Range request percent (percent of range requests [startKey, endKey) as opposed to point requests with just a start key)
  2. Load-based splitter constructor
  3. Random seed

This testing framework performs the following work:

  1. Generates the requests for the load splitter to record
  2. Calculates the optimal split key and its left / right weights
  3. Constructs the load splitter and invokes the load splitter's Record function on all the generated requests
  4. Invokes the load splitter's Key function to get the split key and calculates its left / right weights
  5. Maintains the times it took to execute the load splitter's Record and Key functions

This testing framework also supports repeating the test multiple times with different random numbers and calculating the average / max percentage differences of left / right weights and execution times. This testing framework also supports running tests with different settings and printing the test results for each setting.

Example of testing framework output:
Load-Based Splitter Testing Framework Sample Output

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@KaiSun314 KaiSun314 force-pushed the load-based-splitter-test branch 3 times, most recently from 973fb9d to 77d6f22 Compare November 10, 2022 15:45
@KaiSun314 KaiSun314 requested a review from kvoli November 10, 2022 15:49
@KaiSun314 KaiSun314 force-pushed the load-based-splitter-test branch 2 times, most recently from dc73d5e to 52831d4 Compare November 10, 2022 16:12
@KaiSun314 KaiSun314 marked this pull request as ready for review November 10, 2022 16:12
@KaiSun314 KaiSun314 requested a review from a team as a code owner November 10, 2022 16:12
@KaiSun314 KaiSun314 force-pushed the load-based-splitter-test branch from 52831d4 to 3aca977 Compare November 10, 2022 17:14
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @KaiSun314)


pkg/kv/kvserver/split/load_based_splitter_test.go line 42 at r1 (raw file):

}

type Config struct {

Do these structs/methods need to be exported?


pkg/kv/kvserver/split/load_based_splitter_test.go line 66 at r1 (raw file):

}

func runTest(

It feels as though there are a lot of separate components coming together here, could we split out generating the sequence of keys, evaluating an oracle and evaluating the real code?


pkg/kv/kvserver/split/load_based_splitter_test.go line 112 at r1 (raw file):

		return weightedKeys[i].key < weightedKeys[j].key
	})
	var optimalKeyPtr *uint32

Could we split this out into an "oracle" function.


pkg/kv/kvserver/split/load_based_splitter_test.go line 115 at r1 (raw file):

	var prefixTotalWeight float32
	for _, weightedKey := range weightedKeys {
		if optimalKeyPtr == nil || math.Abs(float64(totalWeight-2*prefixTotalWeight)) < math.Abs(float64(totalWeight-2*optimalLeftWeight)) {

This probably needs a comment or explanation.


pkg/kv/kvserver/split/load_based_splitter_test.go line 162 at r1 (raw file):

	var err error
	if generatorType == zipfGenerator {
		generator, err = ycsb.NewZipfGenerator(randSource, 1, iMax, 0.99, false)

Is the 0.99 the same as ycsb here?


pkg/kv/kvserver/split/load_based_splitter_test.go line 215 at r1 (raw file):

func runTestMultipleSettings(t *testing.T, settingsArr []Settings) {
	fmt.Printf(

Could we change this to a similar structure to here: https://github.com/kvoli/cockroach/blob/a92f9564748d3ac23ce596542649a8427ce1e3b8/pkg/kv/kvserver/asim/metrics_tracker.go#L36-L45

For clarity.


pkg/kv/kvserver/split/load_based_splitter_test.go line 255 at r1 (raw file):

				return NewTestFinder(randSource)
			},
			seed: 2022,

If we could parameterize some randomness - i.e. get a seed generated randomly and then also run a n iterations where we randomly select the generator and lengths (without some bounds), it may be useful to enable some degree of assertions on the existing split finder - perhaps just that you find a key would be a good start (assuming that parameter boundaries allow for it).

@KaiSun314 KaiSun314 force-pushed the load-based-splitter-test branch from 3aca977 to 65eddd9 Compare November 11, 2022 02:32
Copy link
Contributor Author

@KaiSun314 KaiSun314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/split/load_based_splitter_test.go line 42 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Do these structs/methods need to be exported?

No, made them all lowercase


pkg/kv/kvserver/split/load_based_splitter_test.go line 66 at r1 (raw file):

Previously, kvoli (Austen) wrote…

It feels as though there are a lot of separate components coming together here, could we split out generating the sequence of keys, evaluating an oracle and evaluating the real code?

Split out into generateRequests, getOptimalKey, and getKey


pkg/kv/kvserver/split/load_based_splitter_test.go line 112 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Could we split this out into an "oracle" function.

Done.


pkg/kv/kvserver/split/load_based_splitter_test.go line 115 at r1 (raw file):

Previously, kvoli (Austen) wrote…

This probably needs a comment or explanation.

Added comment and made the code clearer


pkg/kv/kvserver/split/load_based_splitter_test.go line 162 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Is the 0.99 the same as ycsb here?

Yep defaultTheta in ycsb's zipf generator is 0.99


pkg/kv/kvserver/split/load_based_splitter_test.go line 215 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Could we change this to a similar structure to here: https://github.com/kvoli/cockroach/blob/a92f9564748d3ac23ce596542649a8427ce1e3b8/pkg/kv/kvserver/asim/metrics_tracker.go#L36-L45

For clarity.

Used the tabular printer tabwriter


pkg/kv/kvserver/split/load_based_splitter_test.go line 255 at r1 (raw file):

Previously, kvoli (Austen) wrote…

If we could parameterize some randomness - i.e. get a seed generated randomly and then also run a n iterations where we randomly select the generator and lengths (without some bounds), it may be useful to enable some degree of assertions on the existing split finder - perhaps just that you find a key would be a good start (assuming that parameter boundaries allow for it).

Added documentation explaining this load-based splitter testing framework and the intent / usage / where it's going

@KaiSun314 KaiSun314 force-pushed the load-based-splitter-test branch from 65eddd9 to 9458d1e Compare November 11, 2022 02:53
Here, we create a testing framework to benchmark load-based splitters,
to enable experimenting and comparison of different designs and
settings.

This testing framework inputs:
1. Generator settings to generate the requests for the load splitter to
record
- Start key generator type (zipfian or uniform) and iMax
- Span length generator type (zipfian or uniform) and iMax
- Weight generator type (zipfian or uniform) and iMax
2. Range request percent (percent of range requests [startKey, endKey)
as opposed to point requests with just a start key)
3. Load-based splitter constructor
4. Random seed

This testing framework performs the following work:
1. Generates the requests for the load splitter to record
2. Calculates the optimal split key and its left / right weights
3. Constructs the load splitter and invokes the load splitter's Record
function on all the generated requests
4. Invokes the load splitter's Key function to get the split key and
calculates its left / right weights
5. Maintains the times it took to execute the load splitter's Record and
Key functions

This testing framework also supports repeating the test multiple times
with different random numbers and calculating the average / max
percentage differences of left / right weights and execution times. This
testing framework also supports running tests with different settings
and printing the test results for each setting.

Release note: None
@KaiSun314 KaiSun314 force-pushed the load-based-splitter-test branch from 9458d1e to e0ad7b3 Compare November 11, 2022 03:21
@KaiSun314
Copy link
Contributor Author

bors r=kvoli

@craig
Copy link
Contributor

craig bot commented Nov 14, 2022

Build succeeded:

@craig craig bot merged commit df02d67 into cockroachdb:master Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants