asim: convert randomized testing to data-driven #107957

wenyihu6 · 2023-08-01T16:03:40Z

asim: remove extra parsing for []float64, float64, time.Duration

In cockroachdb/datadriven#45, we upstreamed the
scanning implementation in datadriven library. We can now handle parsing of
[]float64, float64, and time.Duration without additional handling.

Release Note: none
Epic: none

asim: enable user-defined repliFactor, placement in rand range_gen

This patch introduces two additional options for randomized range generations,
letting users define replication factor and placement type. Although some
aspects of ranges configs are randomly generated (ranges and keyspace), these
two configurations are not randomized. Once set by the user, the configuration
will persist across iterations.

Release Note: none
Part Of: #106311

asim: convert randomized testing to data-driven
Previously, the randomized testing framework depends on default settings
hardcoded in the tests, requiring users to change code-configured
parameters to change the settings. This patch converts the framework to a
data-driven approach, enabling more dynamic user inputs, more testing examples,
and greater visibility into what each iteration is testing.

TestRandomized is a randomized data-driven testing framework that validates
allocators by creating randomized configurations. It is designed for
regression and exploratory testing.

There are three modes for every aspect of randomized generation.

Static Mode:

If randomization options are disabled (e.g. no rand_ranges command is
used), the system uses the default configurations (defined in
default_settings.go) with no randomization.

Randomized: two scenarios occur:

Use default settings for randomized generation (e.g.rand_ranges)
Use settings specified with commands (e.g.rand_ranges
range_gen_type=zipf)

The following commands are provided:

1. "rand_cluster" [cluster_gen_type=(single_region|multi_region|any_region)]
	e.g. rand_cluster cluster_gen_type=(multi_region)
	- rand_cluster: randomly picks a predefined cluster configuration
   according to the specified type.
	- cluster_gen_type (default value is multi_region) is cluster
   configuration type. On the next eval, the cluster is generated as the
   initial state of the simulation.

2. "rand_ranges" [placement_type=(even|skewed|random|weighted_rand)]
	[replication_factor=<int>] [range_gen_type=(uniform|zipf)]
	[keyspace_gen_type=(uniform|zipf)] [weighted_rand=(<[]float64>)]
	e.g. rand_ranges placement_type=weighted_rand weighted_rand=(0.1,0.2,0.7)
	e.g. rand_ranges placement_type=skewed replication_factor=1
		 range_gen_type=zipf keyspace_gen_type=uniform
	- rand_ranges: randomly generate a distribution of ranges across stores
   based on the specified parameters. On the next call to eval, ranges and
   their replica placement are generated and loaded to initial state.
	- placement_type(default value is even): defines the type of range placement
	  distribution across stores. Once set, it remains constant across
	  iterations with no randomization involved.
	- replication_factor(default value is 3): represents the replication factor
	  of each range. Once set, it remains constant across iterations with no
	  randomization involved.
	- range_gen_type(default value is uniform): represents the type of
	  distribution used to yield the range parameter as ranges are generated
   across iterations (range ∈[1, 1000]).
	- keyspace_gen_type: represents the type of distribution used to yield the
   keyspace parameter as ranges are generated across iterations
   (keyspace ∈[1000,200000]).
	- weighted_rand: specifies the weighted random distribution among stores.
	  Requirements (will panic otherwise): 1. weighted_rand should only be
   used with placement_type=weighted_rand and vice versa. 2. Must specify a
   weight between [0.0, 1.0] for each element in the array, with each element
   corresponding to a store 3. len(weighted_rand) cannot be greater than
   number of stores 4. sum of weights in the array should be equal to 1

3. "eval" [seed=<int64>] [num_iterations=<int>] [duration=<time.Duration>]
[verbose=<bool>]
e.g. eval seed=20 duration=30m2s verbose=true
   - eval: generates a simulation based on the configuration set with the given
   commands.
   - seed(default value is int64(42)): used to create a new random number
   generator which will then be used to create a new seed for each iteration.
   - num_iterations(default value is 3): specifies the number of simulations to
   run.
   - duration(default value is 10m): defines duration of each iteration.
   - verbose(default value is false): if set to true, plots all stat(as
   specified by defaultStat) history.

RandTestingFramework is initialized with specified testSetting and maintains
its state across all iterations. It repeats the test with different random
configurations. Each iteration in RandTestingFramework executes the following
steps:

Generates a random configuration: based on whether randOption is on and
the specific settings for randomized generation.
Executes the simulation and checks the assertions on the final state.
Stores any outputs and assertion failures in a buffer.

Release note: None
Part Of: #106311

cockroach-teamcity · 2023-08-01T16:04:22Z

This change is

In cockroachdb/datadriven#45, we upstreamed the scanning implementation in `datadriven` library. We can now handle parsing of []float64, float64, and time.Duration without additional handling. Release Note: none Epic: none

This patch introduces two additional options for randomized range generations, letting users define replication factor and placement type. Although some aspects of ranges configs are randomly generated (ranges and keyspace), these two configurations are not randomized. Once set by the user, the configuration will persist across iterations. Release Note: none Part Of: cockroachdb#106311

kvoli

Flushing comments.

Reviewed 9 of 9 files at r2, 4 of 4 files at r3, 11 of 20 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)

-- commits line 27 at r4:
nit: "hardcoded in the code" seems like odd wording.

-- commits line 73 at r4:
nit: some of these bullet points end with a period, whilst others do not. I'd change them to be consistent, excluding the e.g.

pkg/kv/kvserver/asim/tests/rand_framework.go line 169 at r4 (raw file):

	for i := 0; i < numIterations; i++ {
		if i == 0 {
			f.recordBuf.WriteString(fmt.Sprintln("----------------------------------"))

nit: de-dupe these strings with a const.

kvoli

Reviewed 9 of 20 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)

pkg/kv/kvserver/asim/tests/rand_gen.go line 205 at r4 (raw file):

}

func (c clusterConfigType) getClusterConfigType(s string) clusterConfigType {

Discussed in person, it would be less error prone if this were a function, rather than a method on a struct.

pkg/kv/kvserver/asim/tests/testdata/rand/rand_ranges line 11 at r4 (raw file):

eval duration=5m num_iterations=3 verbose=true
----
settings           num_iterations=3         duration=5m0s

Discussed offline, the table structure here doesn't make sense when there are multiple different column lengths, per row.

Instead, consider using a tree like approach, tabbing (or just 2 spaces) in child settings.

pkg/kv/kvserver/asim/tests/testdata/rand/rand_ranges line 261 at r4 (raw file):

  actual unavailable=0 under=0, over=9 violating=0
over replicated:
  r120:000001{8921-9080} [(n8,s8):2, (n15,s15):3] applying ttl_seconds=0 num_replicas=1 num_voters=1

Nice! Is this the same as the thrashing bug you found earlier?

wenyihu6 · 2023-08-21T16:05:06Z

-- commits line 27 at r4:

Previously, kvoli (Austen) wrote…

nit: "hardcoded in the code" seems like odd wording.

Done.

wenyihu6 · 2023-08-21T16:05:42Z

pkg/kv/kvserver/asim/tests/rand_gen.go line 205 at r4 (raw file):

Previously, kvoli (Austen) wrote…

Discussed in person, it would be less error prone if this were a function, rather than a method on a struct.

Done.

wenyihu6 · 2023-08-21T16:05:50Z

pkg/kv/kvserver/asim/tests/testdata/rand/rand_ranges line 11 at r4 (raw file):

Previously, kvoli (Austen) wrote…

Discussed offline, the table structure here doesn't make sense when there are multiple different column lengths, per row.

Instead, consider using a tree like approach, tabbing (or just 2 spaces) in child settings.

Done.

wenyihu6 · 2023-08-21T16:05:57Z

pkg/kv/kvserver/asim/tests/testdata/rand/rand_ranges line 261 at r4 (raw file):

Previously, kvoli (Austen) wrote…

Nice! Is this the same as the thrashing bug you found earlier?

Yup.

kvoli

Nice! A minor comment nit.

Reviewed 7 of 7 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)

pkg/kv/kvserver/asim/tests/rand_framework.go line 47 at r5 (raw file):

}

// String converts the test setting to string for output.

nit: to a string

Previously, the randomized testing framework depends on default settings hardcoded in the tests, requiring users to change code-configured parameters to change the settings. This patch converts the framework to a data-driven approach, enabling more dynamic user inputs, more testing examples, and greater visibility into what each iteration is testing. TestRandomized is a randomized data-driven testing framework that validates allocators by creating randomized configurations. It is designed for regression and exploratory testing. **There are three modes for every aspect of randomized generation.** - Static Mode: 1. If randomization options are disabled (e.g. no rand_ranges command is used), the system uses the default configurations (defined in default_settings.go) with no randomization. - Randomized: two scenarios occur: 2. Use default settings for randomized generation (e.g.rand_ranges) 3. Use settings specified with commands (e.g.rand_ranges range_gen_type=zipf) **The following commands are provided:** ``` 1. "rand_cluster" [cluster_gen_type=(single_region|multi_region|any_region)] e.g. rand_cluster cluster_gen_type=(multi_region) - rand_cluster: randomly picks a predefined cluster configuration according to the specified type. - cluster_gen_type (default value is multi_region) is cluster configuration type. On the next eval, the cluster is generated as the initial state of the simulation. 2. "rand_ranges" [placement_type=(even|skewed|random|weighted_rand)] [replication_factor=<int>] [range_gen_type=(uniform|zipf)] [keyspace_gen_type=(uniform|zipf)] [weighted_rand=(<[]float64>)] e.g. rand_ranges placement_type=weighted_rand weighted_rand=(0.1,0.2,0.7) e.g. rand_ranges placement_type=skewed replication_factor=1 range_gen_type=zipf keyspace_gen_type=uniform - rand_ranges: randomly generate a distribution of ranges across stores based on the specified parameters. On the next call to eval, ranges and their replica placement are generated and loaded to initial state. - placement_type(default value is even): defines the type of range placement distribution across stores. Once set, it remains constant across iterations with no randomization involved. - replication_factor(default value is 3): represents the replication factor of each range. Once set, it remains constant across iterations with no randomization involved. - range_gen_type(default value is uniform): represents the type of distribution used to yield the range parameter as ranges are generated across iterations (range ∈[1, 1000]). - keyspace_gen_type: represents the type of distribution used to yield the keyspace parameter as ranges are generated across iterations (keyspace ∈[1000,200000]). - weighted_rand: specifies the weighted random distribution among stores. Requirements (will panic otherwise): 1. weighted_rand should only be used with placement_type=weighted_rand and vice versa. 2. Must specify a weight between [0.0, 1.0] for each element in the array, with each element corresponding to a store 3. len(weighted_rand) cannot be greater than number of stores 4. sum of weights in the array should be equal to 1 3. "eval" [seed=<int64>] [num_iterations=<int>] [duration=<time.Duration>] [verbose=<bool>] e.g. eval seed=20 duration=30m2s verbose=true - eval: generates a simulation based on the configuration set with the given commands. - seed(default value is int64(42)): used to create a new random number generator which will then be used to create a new seed for each iteration. - num_iterations(default value is 3): specifies the number of simulations to run. - duration(default value is 10m): defines duration of each iteration. - verbose(default value is false): if set to true, plots all stat(as specified by defaultStat) history. ``` RandTestingFramework is initialized with specified testSetting and maintains its state across all iterations. It repeats the test with different random configurations. Each iteration in RandTestingFramework executes the following steps: 1. Generates a random configuration: based on whether randOption is on and the specific settings for randomized generation. 2. Executes the simulation and checks the assertions on the final state. 3. Stores any outputs and assertion failures in a buffer. Release note: None Part Of: cockroachdb#106311

wenyihu6 · 2023-08-21T18:36:02Z

TFTR!

bors r=kvoli

craig · 2023-08-21T21:48:59Z

Build succeeded:

Bazel Essential CI (Cockroach)

wenyihu6 force-pushed the new-datadriven branch 3 times, most recently from ea54f46 to 389fd14 Compare August 1, 2023 21:22

wenyihu6 self-assigned this Aug 1, 2023

wenyihu6 force-pushed the new-datadriven branch 18 times, most recently from 7415e96 to 25834f0 Compare August 2, 2023 20:58

wenyihu6 changed the title ~~asim: convert randomized testing to data driven~~ asim: better outputs for data-driven tests Aug 2, 2023

wenyihu6 force-pushed the new-datadriven branch from 25834f0 to 0718940 Compare August 2, 2023 21:02

wenyihu6 changed the title ~~asim: better outputs for data-driven tests~~ asim: convert randomized testing to data-driven Aug 2, 2023

wenyihu6 mentioned this pull request Aug 2, 2023

asim: better outputs for data-driven tests #108059

Merged

wenyihu6 force-pushed the new-datadriven branch 3 times, most recently from 6271d1b to c67d314 Compare August 3, 2023 09:10

wenyihu6 marked this pull request as ready for review August 17, 2023 14:54

wenyihu6 requested a review from a team as a code owner August 17, 2023 14:54

wenyihu6 requested a review from kvoli August 17, 2023 14:54

wenyihu6 force-pushed the new-datadriven branch from c4c1180 to d0eebaf Compare August 17, 2023 15:16

wenyihu6 added 2 commits August 17, 2023 13:10

wenyihu6 force-pushed the new-datadriven branch 5 times, most recently from 0223bbd to ee74036 Compare August 18, 2023 06:44

kvoli reviewed Aug 18, 2023

View reviewed changes

wenyihu6 force-pushed the new-datadriven branch 5 times, most recently from 3c0239c to 3718c34 Compare August 21, 2023 16:04

wenyihu6 requested a review from kvoli August 21, 2023 16:06

kvoli approved these changes Aug 21, 2023

View reviewed changes

wenyihu6 force-pushed the new-datadriven branch from 3718c34 to 208d39a Compare August 21, 2023 17:06

craig bot merged commit 604a90a into cockroachdb:master Aug 21, 2023

wenyihu6 deleted the new-datadriven branch August 21, 2023 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asim: convert randomized testing to data-driven #107957

asim: convert randomized testing to data-driven #107957

wenyihu6 commented Aug 1, 2023 •

edited

Loading

cockroach-teamcity commented Aug 1, 2023

kvoli left a comment

kvoli left a comment

wenyihu6 commented Aug 21, 2023

wenyihu6 commented Aug 21, 2023

wenyihu6 commented Aug 21, 2023

wenyihu6 commented Aug 21, 2023

kvoli left a comment

wenyihu6 commented Aug 21, 2023

craig bot commented Aug 21, 2023

asim: convert randomized testing to data-driven #107957

asim: convert randomized testing to data-driven #107957

Conversation

wenyihu6 commented Aug 1, 2023 • edited Loading

cockroach-teamcity commented Aug 1, 2023

kvoli left a comment

Choose a reason for hiding this comment

kvoli left a comment

Choose a reason for hiding this comment

wenyihu6 commented Aug 21, 2023

wenyihu6 commented Aug 21, 2023

wenyihu6 commented Aug 21, 2023

wenyihu6 commented Aug 21, 2023

kvoli left a comment

Choose a reason for hiding this comment

wenyihu6 commented Aug 21, 2023

craig bot commented Aug 21, 2023

wenyihu6 commented Aug 1, 2023 •

edited

Loading