Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: remove testing cluster settings in favor of metamorphic build tag #57483

Merged
merged 6 commits into from
Dec 15, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Dec 3, 2020

coldata: fix tests for randomized batch size

This commit makes all tests to work when the batch size is randomized.
Additionally, it removes nulls.swap function that is not used (we can
always look up the implementation in the history if the need arises).

Release note: None

util: fix metamorphic helper in an edge case

Previously, if the specified range contained exactly one integer (i.e.
min == max), we would hit a division by zero error. This commit fixes
the issue.

Release note: None

sql: fix all failures when mutations.MaxBatchSize is randomized

This commit fixes all the test failures that occur when running with
metamorphic tag and setting mutations.MaxBatchSize to 1 explicitly -
a couple of unit tests need to be skipped as well as some of the logic
tests.

The logic tests now support !metamorphic blocklist directive that
skips the whole file under the metamorphic build. This is only necessary
for very few cases (mostly around tracing of the KV operations).

Release note: None

sql: remove testing.vectorize.batch_size setting

With the recent addition of the metamorphic build tag we no longer need
to expose the sql.testing.vectorize.batch_size cluster setting in
order to randomize coldata.BatchSize() for testing purposes, so this
commit retires the setting. The only testing coverage we're losing is
tpchvec/smallbatchsize test, but I don't think it ever found any
issues, so it doesn't seem important to figure out how to use the
metamorphic tag for it.

Fixes: #51156.
Fixes: #57088.
Fixes: #57198.
Addresses: #55348.

Release note: None (removing non-public setting)

sql: remove testing.mutations.max_batch_size setting

This commit removes no longer used
sql.testing.mutations.max_batch_size cluster setting since the
metamorphic build provides us the coverage the setting was intended to
(it was currently skipped anyway in all cases).

Additionally it moves a single logic test file from execbuilder tests
(to undo the move when we introduced the cluster setting).

Addresses: #55348.

Release note: None (removing non-public setting)

util,build: use crdb_test tag instead of metamorphic

This commit switches the builds to become "metamorphic" based on the
recently introduced crdb_test tag (which is appended to all test
targets) instead of metamorphic. The corresponding teamcity config is
removed because we will get the metamorphic runs with the regular test
runs now.

Note that since some logic tests cannot be run when some constants
initialized to non-default value, in order to not lose the test coverage
for those tests, even if crdb_test build is specified, with 20%
probability the build will not be metamorphic.

Additionally, this commit switches the logic of
ConstantWithMetamorphic* methods to also choose the default value (in
25% cases). The reasoning for this change is that we want to make sure
to run the tests with the default values relatively often.

To summarize, the current behavior is:

  • crdb_test build tag is specified (all test targets):
    • with 20% probability the build is not metamorphic, so all magic
      constants get the default value.
    • with 80% probability the build is metamorphic. Every value being
      metamorphized has 75% of being initialized to non-default value.
  • crdb_test build is not specified:
    • the build is not metamorphic, so all magic constants get the
      produciton value.

Another minor change is a fix to
row.TestingSetDatumRowConverterBatchSize to properly reset a value to
its old one in the returned closure.

TestMysqldumpDataReader is currently skipped when the metamorphic
build occurs pending further investigation.

Release note: None

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Dec 3, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the meta-test branch 3 times, most recently from 38caf41 to 4db5220 Compare December 3, 2020 23:42
@yuzefovich yuzefovich requested a review from a team as a code owner December 3, 2020 23:42
@yuzefovich yuzefovich force-pushed the meta-test branch 2 times, most recently from bb320e2 to fd896d6 Compare December 4, 2020 00:39
@yuzefovich yuzefovich changed the title [DNM] test of metamorphic build sql: remove testing cluster settings in favor of metamorphic build tag Dec 4, 2020
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Dec 4, 2020
@yuzefovich yuzefovich requested review from jordanlewis and a team December 4, 2020 16:42
@yuzefovich
Copy link
Member Author

Ok, I think I fixed all of the current failures of the metamorphic CI config. I also confirmed that using value 1 for mutations.MaxBatchSize would have caught the upsert bug from a couple of months ago.

I updated the metamorphic config to run nightly on master, the only remaining thing is making sure that github issues are filed.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this, thanks!

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 13 of 13 files at r3, 8 of 8 files at r4, 9 of 9 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)


pkg/sql/logictest/logic.go, line 1622 at r3 (raw file):

		for c := range blocklist {
			if c == "metamorphic" {
				return configs

I don't understand this logic, could you explain it to me? Basically if there is a !metamorphic in the file we'll run with no configs?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)


pkg/sql/logictest/logic.go, line 1622 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I don't understand this logic, could you explain it to me? Basically if there is a !metamorphic in the file we'll run with no configs?

Yeah, exactly, added a comment.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, and @yuzefovich)


pkg/sql/logictest/logic.go, line 1622 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Yeah, exactly, added a comment.

But is that right? Don't we want to run with the configs that weren't blocklist directives or the default set if all the configs were blocklist directives?

e.g.

# LogicTest: local !metamorphic

Should still run with local

# LogicTest: !3node-tenant !metamorphic

Should run with all default configs minus 3node-tenant and metamorphic

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)


pkg/sql/logictest/logic.go, line 1622 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

But is that right? Don't we want to run with the configs that weren't blocklist directives or the default set if all the configs were blocklist directives?

e.g.

# LogicTest: local !metamorphic

Should still run with local

# LogicTest: !3node-tenant !metamorphic

Should run with all default configs minus 3node-tenant and metamorphic

Extended the comment, PTAL.

@yuzefovich
Copy link
Member Author

We had a report of the metamorphic build failure here which I triggered manually. I removed the old metamorphic build config and created a new one under Nightlies project which will run nightly. I think the only remaining question is how to filter the branches on which it'll run (I think as is it'll run on all release branches, including master), but running on branches older than the master is useless since they don't support the metamorphic tag.

@RaduBerinde
Copy link
Member

Why not enable the metamorphic stuff on all test targets? (the crdb_test tag I just added in #57532).

@yuzefovich
Copy link
Member Author

Hm, I think we want to run the tests both with and without the randomizations, and currently crdb_test tag is appended to all test configs, so we would only run with the randomizations if we were to treat it as "metamorphic=true".

I guess we could adjust the randomizer so that in 50% cases it used the default value, and in other 50% it would use the randomized value. That sounds good to me. @asubiotto @jordanlewis thoughts?

@mgartner
Copy link
Collaborator

mgartner commented Dec 9, 2020

If you make it random, then any test that fails with metamorphic configs will be a flaky failure rather than deterministic. Will that cause confusion?

@yuzefovich
Copy link
Member Author

The chosen random values are logged at the test startup, so ideally all failures would be reproducible (at least under stress). I guess we'll have to see.

I think overall we do have many magic constants throughout the code base, and it'd be very beneficial to randomize them during some test runs. It's possible that multiple randomizations combined could trigger something unexpected, but I hope the signal won't be that noisy.

@RaduBerinde
Copy link
Member

IMO it's preferable for as much randomization as reasonably possible to happen during regular test runs. If it only happens in a nightly run, whoever sets up that nightly run is stuck monitoring it and debugging/triaging failures.

@yuzefovich
Copy link
Member Author

That's a good point. I've been debating in my mind whether the metamorphic tests should be run as part of CI or not.

A minor complication here is that some tests rely on a particular default value of the constants for its expected output, so we'll need to be able to tell whether the randomizer chose the default or random value.

@yuzefovich
Copy link
Member Author

Will rebase on top of the tracing fix.

bors r-

@craig
Copy link
Contributor

craig bot commented Dec 15, 2020

Canceled.

This commit makes all tests to work when the batch size is randomized.
Additionally, it removes `nulls.swap` function that is not used (we can
always look up the implementation in the history if the need arises).

Release note: None
Previously, if the specified range contained exactly one integer (i.e.
`min == max`), we would hit a division by zero error. This commit fixes
the issue.

Release note: None
This commit fixes all the test failures that occur when running with
metamorphic tag and setting `mutations.MaxBatchSize` to 1 explicitly -
a couple of unit tests need to be skipped as well as some of the logic
tests.

The logic tests now support `!metamorphic` blocklist directive that
skips the whole file under the metamorphic build. This is only necessary
for very few cases (mostly around tracing of the KV operations).

Release note: None
With the recent addition of the metamorphic build tag we no longer need
to expose the `sql.testing.vectorize.batch_size` cluster setting in
order to randomize `coldata.BatchSize()` for testing purposes, so this
commit retires the setting. The only testing coverage we're losing is
`tpchvec/smallbatchsize` test, but I don't think it ever found any
issues, so it doesn't seem important to figure out how to use the
metamorphic tag for it.

Release note: None (removing non-public setting)
This commit removes no longer used
`sql.testing.mutations.max_batch_size` cluster setting since the
metamorphic build provides us the coverage the setting was intended to
(it was currently skipped anyway in all cases).

Additionally it moves a single logic test file from execbuilder tests
(to undo the move when we introduced the cluster setting).

Release note: None (removing non-public setting)
@yuzefovich
Copy link
Member Author

bors r+

@yuzefovich
Copy link
Member Author

Hm, I've seen importccl package fail several times on this PR, and it failed again, so I'll need to take a closer look before merging it.

bors r-

@craig
Copy link
Contributor

craig bot commented Dec 15, 2020

Canceled.

@yuzefovich
Copy link
Member Author

There was a valid timeout that I can easily reproduce, and it's possible that there is a bug with mysql dump import, I filed #57955 and skipped the test under metamorphic build for now.

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 15, 2020

🕐 Waiting for PR status (Github check) to be set, probably by CI. Bors will automatically try to run when all required PR statuses are set.

This commit switches the builds to become "metamorphic" based on the
recently introduced `crdb_test` tag (which is appended to all test
targets) instead of `metamorphic`. The corresponding teamcity config is
removed because we will get the metamorphic runs with the regular test
runs now.

Note that since some logic tests cannot be run when some constants
initialized to non-default value, in order to not lose the test coverage
for those tests, even if `crdb_test` build is specified, with 20%
probability the build will not be metamorphic.

Additionally, this commit switches the logic of
`ConstantWithMetamorphic*` methods to also choose the default value (in
25% cases). The reasoning for this change is that we want to make sure
to run the tests with the default values relatively often.

To summarize, the current behavior is:
- `crdb_test` build tag is specified (all test targets):
  - with 20% probability the build is not metamorphic, so all magic
constants get the default value.
  - with 80% probability the build is metamorphic. Every value being
metamorphized has 75% of being initialized to non-default value.
- `crdb_test` build is not specified:
  - the build is not metamorphic, so all magic constants get the
produciton value.

Another minor change is a fix to
`row.TestingSetDatumRowConverterBatchSize` to properly reset a value to
its old one in the returned closure.

`TestMysqldumpDataReader` is currently skipped when the metamorphic
build occurs pending further investigation.

Release note: None
@yuzefovich
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 15, 2020

Build succeeded:

@craig craig bot merged commit d300362 into cockroachdb:master Dec 15, 2020
@yuzefovich yuzefovich deleted the meta-test branch December 15, 2020 18:56
@RaduBerinde
Copy link
Member

A particularly annoying side-effect is that when you run execbuilder tests with -rewrite, some of the files don't actually get rewritten. This is very annoying for optimizer work, where a very common workflow is to run -rewrite on all opt tests and study the diffs.

Is it possible to at least add a flag (or build tag?) that disables the metamorphic stuff? Or maybe the right solution, at least for the optimizer, is to move out all the tests that involve traces somewhere else.

@yuzefovich
Copy link
Member Author

Yeah, good point. I don't think adding a flag will work (I'm assuming you're thinking about TESTFLAGS for logic tests) since they are processed too late, but we could add a build tag. What do you think about crdb_test_off flag that will need to be added when -rewrite option is used? The example invocation would be make testoptlogic TESTFLAGS='-rewrite' TAGS=crdb_test_off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants