Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asim: convert randomized testing to data-driven #107957

Merged
merged 3 commits into from
Aug 21, 2023

Conversation

wenyihu6
Copy link
Contributor

@wenyihu6 wenyihu6 commented Aug 1, 2023

asim: remove extra parsing for []float64, float64, time.Duration

In cockroachdb/datadriven#45, we upstreamed the
scanning implementation in datadriven library. We can now handle parsing of
[]float64, float64, and time.Duration without additional handling.

Release Note: none
Epic: none


asim: enable user-defined repliFactor, placement in rand range_gen

This patch introduces two additional options for randomized range generations,
letting users define replication factor and placement type. Although some
aspects of ranges configs are randomly generated (ranges and keyspace), these
two configurations are not randomized. Once set by the user, the configuration
will persist across iterations.

Release Note: none
Part Of: #106311


asim: convert randomized testing to data-driven
Previously, the randomized testing framework depends on default settings
hardcoded in the tests, requiring users to change code-configured
parameters to change the settings. This patch converts the framework to a
data-driven approach, enabling more dynamic user inputs, more testing examples,
and greater visibility into what each iteration is testing.

TestRandomized is a randomized data-driven testing framework that validates
allocators by creating randomized configurations. It is designed for
regression and exploratory testing.

There are three modes for every aspect of randomized generation.

  • Static Mode:
  1. If randomization options are disabled (e.g. no rand_ranges command is
    used), the system uses the default configurations (defined in
    default_settings.go) with no randomization.
  • Randomized: two scenarios occur:
  1. Use default settings for randomized generation (e.g.rand_ranges)
  2. Use settings specified with commands (e.g.rand_ranges
    range_gen_type=zipf)

The following commands are provided:

1. "rand_cluster" [cluster_gen_type=(single_region|multi_region|any_region)]
	e.g. rand_cluster cluster_gen_type=(multi_region)
	- rand_cluster: randomly picks a predefined cluster configuration
   according to the specified type.
	- cluster_gen_type (default value is multi_region) is cluster
   configuration type. On the next eval, the cluster is generated as the
   initial state of the simulation.

2. "rand_ranges" [placement_type=(even|skewed|random|weighted_rand)]
	[replication_factor=<int>] [range_gen_type=(uniform|zipf)]
	[keyspace_gen_type=(uniform|zipf)] [weighted_rand=(<[]float64>)]
	e.g. rand_ranges placement_type=weighted_rand weighted_rand=(0.1,0.2,0.7)
	e.g. rand_ranges placement_type=skewed replication_factor=1
		 range_gen_type=zipf keyspace_gen_type=uniform
	- rand_ranges: randomly generate a distribution of ranges across stores
   based on the specified parameters. On the next call to eval, ranges and
   their replica placement are generated and loaded to initial state.
	- placement_type(default value is even): defines the type of range placement
	  distribution across stores. Once set, it remains constant across
	  iterations with no randomization involved.
	- replication_factor(default value is 3): represents the replication factor
	  of each range. Once set, it remains constant across iterations with no
	  randomization involved.
	- range_gen_type(default value is uniform): represents the type of
	  distribution used to yield the range parameter as ranges are generated
   across iterations (range ∈[1, 1000]).
	- keyspace_gen_type: represents the type of distribution used to yield the
   keyspace parameter as ranges are generated across iterations
   (keyspace ∈[1000,200000]).
	- weighted_rand: specifies the weighted random distribution among stores.
	  Requirements (will panic otherwise): 1. weighted_rand should only be
   used with placement_type=weighted_rand and vice versa. 2. Must specify a
   weight between [0.0, 1.0] for each element in the array, with each element
   corresponding to a store 3. len(weighted_rand) cannot be greater than
   number of stores 4. sum of weights in the array should be equal to 1

3. "eval" [seed=<int64>] [num_iterations=<int>] [duration=<time.Duration>]
[verbose=<bool>]
e.g. eval seed=20 duration=30m2s verbose=true
   - eval: generates a simulation based on the configuration set with the given
   commands.
   - seed(default value is int64(42)): used to create a new random number
   generator which will then be used to create a new seed for each iteration.
   - num_iterations(default value is 3): specifies the number of simulations to
   run.
   - duration(default value is 10m): defines duration of each iteration.
   - verbose(default value is false): if set to true, plots all stat(as
   specified by defaultStat) history.

RandTestingFramework is initialized with specified testSetting and maintains
its state across all iterations. It repeats the test with different random
configurations. Each iteration in RandTestingFramework executes the following
steps:

  1. Generates a random configuration: based on whether randOption is on and
    the specific settings for randomized generation.
  2. Executes the simulation and checks the assertions on the final state.
  3. Stores any outputs and assertion failures in a buffer.

Release note: None
Part Of: #106311

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@wenyihu6 wenyihu6 force-pushed the new-datadriven branch 3 times, most recently from ea54f46 to 389fd14 Compare August 1, 2023 21:22
@wenyihu6 wenyihu6 self-assigned this Aug 1, 2023
@wenyihu6 wenyihu6 force-pushed the new-datadriven branch 18 times, most recently from 7415e96 to 25834f0 Compare August 2, 2023 20:58
@wenyihu6 wenyihu6 changed the title asim: convert randomized testing to data driven asim: better outputs for data-driven tests Aug 2, 2023
@wenyihu6 wenyihu6 changed the title asim: better outputs for data-driven tests asim: convert randomized testing to data-driven Aug 2, 2023
@wenyihu6 wenyihu6 force-pushed the new-datadriven branch 3 times, most recently from 6271d1b to c67d314 Compare August 3, 2023 09:10
@wenyihu6 wenyihu6 marked this pull request as ready for review August 17, 2023 14:54
@wenyihu6 wenyihu6 requested a review from a team as a code owner August 17, 2023 14:54
@wenyihu6 wenyihu6 requested a review from kvoli August 17, 2023 14:54
In cockroachdb/datadriven#45, we upstreamed the
scanning implementation in `datadriven` library. We can now handle parsing of
[]float64, float64, and time.Duration without additional handling.

Release Note: none
Epic: none
This patch introduces two additional options for randomized range generations,
letting users define  replication factor and placement type. Although some
aspects of ranges configs are randomly generated (ranges and keyspace), these
two configurations are not randomized. Once set by the user, the configuration
will persist across iterations.

Release Note: none
Part Of: cockroachdb#106311
@wenyihu6 wenyihu6 force-pushed the new-datadriven branch 5 times, most recently from 0223bbd to ee74036 Compare August 18, 2023 06:44
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing comments.

Reviewed 9 of 9 files at r2, 4 of 4 files at r3, 11 of 20 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)


-- commits line 27 at r4:
nit: "hardcoded in the code" seems like odd wording.


-- commits line 73 at r4:
nit: some of these bullet points end with a period, whilst others do not. I'd change them to be consistent, excluding the e.g.


pkg/kv/kvserver/asim/tests/rand_framework.go line 169 at r4 (raw file):

	for i := 0; i < numIterations; i++ {
		if i == 0 {
			f.recordBuf.WriteString(fmt.Sprintln("----------------------------------"))

nit: de-dupe these strings with a const.

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 20 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)


pkg/kv/kvserver/asim/tests/rand_gen.go line 205 at r4 (raw file):

}

func (c clusterConfigType) getClusterConfigType(s string) clusterConfigType {

Discussed in person, it would be less error prone if this were a function, rather than a method on a struct.


pkg/kv/kvserver/asim/tests/testdata/rand/rand_ranges line 11 at r4 (raw file):

eval duration=5m num_iterations=3 verbose=true
----
settings           num_iterations=3         duration=5m0s

Discussed offline, the table structure here doesn't make sense when there are multiple different column lengths, per row.

Instead, consider using a tree like approach, tabbing (or just 2 spaces) in child settings.


pkg/kv/kvserver/asim/tests/testdata/rand/rand_ranges line 261 at r4 (raw file):

  actual unavailable=0 under=0, over=9 violating=0
over replicated:
  r120:000001{8921-9080} [(n8,s8):2, (n15,s15):3] applying ttl_seconds=0 num_replicas=1 num_voters=1

Nice! Is this the same as the thrashing bug you found earlier?

@wenyihu6 wenyihu6 force-pushed the new-datadriven branch 5 times, most recently from 3c0239c to 3718c34 Compare August 21, 2023 16:04
@wenyihu6
Copy link
Contributor Author

-- commits line 27 at r4:

Previously, kvoli (Austen) wrote…

nit: "hardcoded in the code" seems like odd wording.

Done.

@wenyihu6
Copy link
Contributor Author

pkg/kv/kvserver/asim/tests/rand_gen.go line 205 at r4 (raw file):

Previously, kvoli (Austen) wrote…

Discussed in person, it would be less error prone if this were a function, rather than a method on a struct.

Done.

@wenyihu6
Copy link
Contributor Author

pkg/kv/kvserver/asim/tests/testdata/rand/rand_ranges line 11 at r4 (raw file):

Previously, kvoli (Austen) wrote…

Discussed offline, the table structure here doesn't make sense when there are multiple different column lengths, per row.

Instead, consider using a tree like approach, tabbing (or just 2 spaces) in child settings.

Done.

@wenyihu6
Copy link
Contributor Author

pkg/kv/kvserver/asim/tests/testdata/rand/rand_ranges line 261 at r4 (raw file):

Previously, kvoli (Austen) wrote…

Nice! Is this the same as the thrashing bug you found earlier?

Yup.

@wenyihu6 wenyihu6 requested a review from kvoli August 21, 2023 16:06
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Nice! A minor comment nit.

Reviewed 7 of 7 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)


pkg/kv/kvserver/asim/tests/rand_framework.go line 47 at r5 (raw file):

}

// String converts the test setting to string for output.

nit: to a string

Previously, the randomized testing framework depends on default settings
hardcoded in the tests, requiring users to change code-configured
parameters to change the settings. This patch converts the framework to a
data-driven approach, enabling more dynamic user inputs, more testing examples,
and greater visibility into what each iteration is testing.

TestRandomized is a randomized data-driven testing framework that validates
allocators by creating randomized configurations. It is designed for
regression and exploratory testing.

**There are three modes for every aspect of randomized generation.**
- Static Mode:
1. If randomization options are disabled (e.g. no rand_ranges command is
used), the system uses the default configurations (defined in
default_settings.go) with no randomization.
- Randomized: two scenarios occur:
2. Use default settings for randomized generation (e.g.rand_ranges)
3. Use settings specified with commands (e.g.rand_ranges
range_gen_type=zipf)

**The following commands are provided:**
```
1. "rand_cluster" [cluster_gen_type=(single_region|multi_region|any_region)]
	e.g. rand_cluster cluster_gen_type=(multi_region)
	- rand_cluster: randomly picks a predefined cluster configuration
   according to the specified type.
	- cluster_gen_type (default value is multi_region) is cluster
   configuration type. On the next eval, the cluster is generated as the
   initial state of the simulation.

2. "rand_ranges" [placement_type=(even|skewed|random|weighted_rand)]
	[replication_factor=<int>] [range_gen_type=(uniform|zipf)]
	[keyspace_gen_type=(uniform|zipf)] [weighted_rand=(<[]float64>)]
	e.g. rand_ranges placement_type=weighted_rand weighted_rand=(0.1,0.2,0.7)
	e.g. rand_ranges placement_type=skewed replication_factor=1
		 range_gen_type=zipf keyspace_gen_type=uniform
	- rand_ranges: randomly generate a distribution of ranges across stores
   based on the specified parameters. On the next call to eval, ranges and
   their replica placement are generated and loaded to initial state.
	- placement_type(default value is even): defines the type of range placement
	  distribution across stores. Once set, it remains constant across
	  iterations with no randomization involved.
	- replication_factor(default value is 3): represents the replication factor
	  of each range. Once set, it remains constant across iterations with no
	  randomization involved.
	- range_gen_type(default value is uniform): represents the type of
	  distribution used to yield the range parameter as ranges are generated
   across iterations (range ∈[1, 1000]).
	- keyspace_gen_type: represents the type of distribution used to yield the
   keyspace parameter as ranges are generated across iterations
   (keyspace ∈[1000,200000]).
	- weighted_rand: specifies the weighted random distribution among stores.
	  Requirements (will panic otherwise): 1. weighted_rand should only be
   used with placement_type=weighted_rand and vice versa. 2. Must specify a
   weight between [0.0, 1.0] for each element in the array, with each element
   corresponding to a store 3. len(weighted_rand) cannot be greater than
   number of stores 4. sum of weights in the array should be equal to 1

3. "eval" [seed=<int64>] [num_iterations=<int>] [duration=<time.Duration>]
[verbose=<bool>]
e.g. eval seed=20 duration=30m2s verbose=true
   - eval: generates a simulation based on the configuration set with the given
   commands.
   - seed(default value is int64(42)): used to create a new random number
   generator which will then be used to create a new seed for each iteration.
   - num_iterations(default value is 3): specifies the number of simulations to
   run.
   - duration(default value is 10m): defines duration of each iteration.
   - verbose(default value is false): if set to true, plots all stat(as
   specified by defaultStat) history.
```

RandTestingFramework is initialized with specified testSetting and maintains
its state across all iterations. It repeats the test with different random
configurations. Each iteration in RandTestingFramework executes the following
steps:
1. Generates a random configuration: based on whether randOption is on and
the specific settings for randomized generation.
2. Executes the simulation and checks the assertions on the final state.
3. Stores any outputs and assertion failures in a buffer.

Release note: None
Part Of: cockroachdb#106311
@wenyihu6
Copy link
Contributor Author

TFTR!

bors r=kvoli

@craig craig bot merged commit 604a90a into cockroachdb:master Aug 21, 2023
@craig
Copy link
Contributor

craig bot commented Aug 21, 2023

Build succeeded:

@wenyihu6 wenyihu6 deleted the new-datadriven branch August 21, 2023 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants