Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlsmith: make order-dependent aggregation functions deterministic #84324

Merged
merged 2 commits into from
Jul 15, 2022

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented Jul 13, 2022

cmd: add smith command

Add a new top-level command smith which dumps randomly-generated
sqlsmith queries. This is useful for testing modifications to sqlsmith.

Assists: #83024

Release note: None

sqlsmith: make order-dependent aggregation functions deterministic

Some aggregation functions (e.g. string_agg) have results that depend
on the order of input rows. To make sqlsmith more deterministic, add
ORDER BY clauses to these aggregation functions whenever their argument
is a column reference. (When their argument is a constant, ordering
doesn't matter.)

Fixes: #83024

Release note: None

@michae2 michae2 requested review from mgartner, msirek and a team July 13, 2022 00:11
@michae2 michae2 requested a review from a team as a code owner July 13, 2022 00:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2 michae2 changed the title Smith agg order sqlsmith: make order-dependent aggregation functions deterministic Jul 13, 2022
Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the new smith command! The fix looks good.

Reviewed 6 of 6 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @michae2)


pkg/cmd/smith/main.go line 35 at r2 (raw file):

	for {
		fmt.Print("\n", smither.Generate(), ";\n")

Maybe in the future we can add a command-line option to print a set number of statements.

@michae2 michae2 force-pushed the smith_agg_order branch 2 times, most recently from d1afa9a to a17fc21 Compare July 13, 2022 17:21
Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFYR!

bors r=msirek

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @msirek)


pkg/cmd/smith/main.go line 35 at r2 (raw file):

Previously, msirek (Mark Sirek) wrote…

Maybe in the future we can add a command-line option to print a set number of statements.

Done. Also added some help text.

@craig
Copy link
Contributor

craig bot commented Jul 13, 2022

Build failed:

@mgartner
Copy link
Collaborator

pkg/cmd/smith/main.go line 65 at r3 (raw file):

	// TODO(michae2): Set smither options from command-line options.
	smithOpts := []sqlsmith.SmitherOption{}
	smither, err := sqlsmith.NewSmither(nil, rng, smithOpts...)

IIRC sqlsmith looks at the existing table schema to generate queries. What schema is used ins this case?

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: The new smith command will be very useful!

I am slightly worried that the second commit (along with other recent changes) are reducing the efficacy of sqlsmith's in achieving its original goal - to find tests cases that cause panics. The more we constrain the output of sqlsmith to better support randomized correctness testing, the less effective it will become at randomized panic testing. Maybe should consider implementing two SQLSmith modes - one with almost no restrictions that can be used to catch panics, and one that is as restrictive as necessary to get deterministic results in randomized correctness tests.

Reviewed 4 of 6 files at r1, 3 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @msirek)

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can set a flag when running any of the tests which compare results, and only use the ORDER BY if that flag is true.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @msirek)

Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

bors r-

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @msirek)


pkg/cmd/smith/main.go line 65 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

IIRC sqlsmith looks at the existing table schema to generate queries. What schema is used ins this case?

public

@michae2
Copy link
Collaborator Author

michae2 commented Jul 13, 2022

@mgartner @msirek how would you feel if I extended DisableImpureFns() to also cover this?

@mgartner
Copy link
Collaborator

public

Sorry, my question was not specific. I'm curious what tables SQLSmith is generating statements for when running this command. It doesn't seem like any tables would exist, unless they are being created somewhere here that I'm missing.

@mgartner @msirek how would you feel if I extended DisableImpureFns() to also cover this?

Sounds good to me. Maybe rename it to reflect it's increase influence? DisableNonDeterminism?

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @msirek)

Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I went hog-wild and added to smith:

  • passthrough of simple sqlsmith options
  • option to connect to a database for schema info
  • new MutatingMode for sqlsmith based on mutating smither from TLP / costfuzz / unoptimized-query-oracle
  • ability to generate expressions instead of statements

To answer your question, @mgartner, without a db connection sqlsmith can only generate expressions, SELECT statements (over VALUES tables), CREATE TABLE statements, and probably a few other things. But no mutations.

As for the original issue, I extended DisableImpureFns to cover adding the ORDER BY and renamed it to DisableNondeterministicFns.

Since I changed so much, I'll wait for another look whenever you have time (no rush).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @msirek)

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good

michae2 added 2 commits July 14, 2022 16:30
Add a new top-level command `smith` which dumps randomly-generated
sqlsmith queries. This is useful for testing modifications to sqlsmith.

Assists: cockroachdb#83024

Release note: None
Some aggregation functions (e.g. string_agg) have results that depend
on the order of input rows. To make sqlsmith more deterministic, add
ORDER BY clauses to these aggregation functions whenever their argument
is a column reference. (When their argument is a constant, ordering
doesn't matter.)

Fixes: cockroachdb#83024

Release note: None
@michae2
Copy link
Collaborator Author

michae2 commented Jul 14, 2022

Thanks!

bors r=msirek,mgartner

@craig craig bot merged commit e8818d5 into cockroachdb:master Jul 15, 2022
@craig
Copy link
Contributor

craig bot commented Jul 15, 2022

Build succeeded:

@michae2 michae2 deleted the smith_agg_order branch July 15, 2022 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

costfuzz failure: difference in concat_agg window function results
4 participants