Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: async validation of foreign keys in ADD CONSTRAINT #37433

Merged
merged 1 commit into from
May 16, 2019

Conversation

thoszhang
Copy link
Contributor

Currently, validating an unvalidated foreign key constraint requires running
VALIDATE CONSTRAINT, which executes a potentially long-running query in the
user transaction. In this PR, ADD CONSTRAINT for foreign keys is now
processed by the schema changer, with the validation for existing rows running
in a separate transaction from the original user transaction.

The foreign key mutation (represented by the new FOREIGN_KEY enum value in
ConstraintToUpdate) is processed in the same way as check constraint
mutations: the foreign key is in the mutations list while other columns and
indexes with the same mutation ID are backfilled, then added to the appropriate
index in the Validating state before being validated, and is finalized when
the validation query for existing rows completes successfully. If validation
fails, the transaction is rolled back, with the foreign key (and backreference)
removed from the table descriptor(s) as part of the rollback.

Adding foreign keys to columns or indexes being added is still not supported
and is left for later work. Also unsupported is adding a foreign key constraint
in the same transaction as CREATE TABLE that is either validated or that
rolls back the entire transaction on failure. In this PR, the constraint is
just left unvalidated; This needs a follow-up PR to queue a separate mutation
for validating the constraint after it's been added.

Release note (sql change): Foreign keys that are added to an existing table
using ALTER TABLE will now be validated for existing rows, with improved
performance compared to running ADD CONSTRAINT followed by VALIDATE CONSTRAINT previously.

@thoszhang thoszhang requested review from dt, vivekmenezes and a team May 9, 2019 20:48
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @vivekmenezes)


pkg/sql/backfill.go, line 273 at r1 (raw file):

							log.VEventf(
								ctx, 2,
								"backfiller tried to add constraint %+v but found existing constraint %+v, presumably due to a retry",

How can a retry cause the constraint to be seen. We usually go through the pain of cleaning up before a retry. Perhaps an error is warranted here?


pkg/sql/schema_changer.go, line 1470 at r1 (raw file):

				log.Infof(
					ctx,
					"attempted to drop constraint %s, but it hadn't been added to the table descriptor yet",

should this be just an error?


pkg/sql/logictest/testdata/logic_test/fk, line 2154 at r1 (raw file):

statement error foreign key violation: "b" row a_z='z2', a_y='y2', a_x='x2' has no match in "a"
ALTER TABLE b VALIDATE CONSTRAINT fk_ref

i see you have taken out a lot of tests above. Can you keep these tests while adding an FK constraint rather than through the validation step?

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 16 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @lucy-zhang, and @vivekmenezes)


pkg/sql/schema_changer.go, line 1220 at r1 (raw file):

// applying a schema change. If a column being added is reversed and droped,
// all new indexes referencing the column will also be dropped.
func (sc *SchemaChanger) reverseMutations(ctx context.Context, causingError error) error {

do we need a drop on failure bool on ConstraintToUpdate?
I'm just thinking that if you add constraint ... not valid and later run validate constraint we should not drop on failure.

Right now I don't think we support not valid and I if I'm reading this correctly, I think VALIDATE CONSTRAINT is still just run in the alter txn, not via schema changer, right? SO in that case, this isn't currently an issue, but when we do add NOT VALID and make VALIDATE CONSTRAINT also a schema change, then thinking about the backwards compatibility of an explicit bool and its zero value, it seems like we won't have any way to turn off automatic dropping by older nodes unless we build it in now?

Copy link
Contributor Author

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @vivekmenezes)


pkg/sql/backfill.go, line 273 at r1 (raw file):

Previously, vivekmenezes wrote…

How can a retry cause the constraint to be seen. We usually go through the pain of cleaning up before a retry. Perhaps an error is warranted here?

It can happen if the cluster goes down in the state after the constraint has been added to the table descriptor in the Validating state, but before it becomes Validated and is removed from the list of mutations. I can manually reproduce this on a single-node cluster by calling Sleep at https://github.com/cockroachdb/cockroach/pull/37433/files#diff-008234c9a29752f8b897974c95348fa4R228, doing a hard shutdown at that time, and then restarting the cluster. Not sure if there's a way to write a test for this.


pkg/sql/schema_changer.go, line 1220 at r1 (raw file):

Previously, dt (David Taylor) wrote…

do we need a drop on failure bool on ConstraintToUpdate?
I'm just thinking that if you add constraint ... not valid and later run validate constraint we should not drop on failure.

Right now I don't think we support not valid and I if I'm reading this correctly, I think VALIDATE CONSTRAINT is still just run in the alter txn, not via schema changer, right? SO in that case, this isn't currently an issue, but when we do add NOT VALID and make VALIDATE CONSTRAINT also a schema change, then thinking about the backwards compatibility of an explicit bool and its zero value, it seems like we won't have any way to turn off automatic dropping by older nodes unless we build it in now?

Yeah, VALIDATE CONSTRAINT is still in the ALTER TABLE transaction now.

I've been thinking that Validating should basically mean "drop on failure," so it only occurs as part of ALTER TABLE without NOT VALID, and a constraint added using NOT VALID and then validated using VALIDATE CONSTRAINT will never be Validating. So, NOT VALID just corresponds to having an Unvalidated constraint in the mutation, and results in skipping the extra step where the descriptor is published with the constraint and the subsequent validation query.

It's not ideal that the names of the states imply a Unvalidated -> Validating -> Validated sequence of states, when the real possibilities are actually either Validating -> Validated or Unvalidated -> Validated. But I think the "drop on failure" state can be pretty much captured by Validating and an extra boolean would just be duplicating that state (even if, arguably, the state should be on the mutation instead of the constraint itself). I'm open to also putting in a boolean instead, though I'm not sure Validating would mean anything separate in that case.

We do also need to add a boolean flag for this mutation type to also support VALIDATE CONSTRAINT for backward compatibility, to only run the validation without adding anything. But I think I'd rather add one or both flags in a separate PR.


pkg/sql/schema_changer.go, line 1470 at r1 (raw file):

Previously, vivekmenezes wrote…

should this be just an error?

This can happen if we have to roll back a schema change before it has reached the backfill stage.


pkg/sql/logictest/testdata/logic_test/fk, line 2154 at r1 (raw file):

Previously, vivekmenezes wrote…

i see you have taken out a lot of tests above. Can you keep these tests while adding an FK constraint rather than through the validation step?

The cases where validation fails on ADD CONSTRAINT are already covered. I added a few more tests for the cases where ADD CONSTRAINT succeeds.

Copy link
Contributor

@vivekmenezes vivekmenezes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt)

@thoszhang thoszhang force-pushed the fk-mutation-1 branch 3 times, most recently from ad61b38 to ef8bd9f Compare May 15, 2019 18:35
@thoszhang thoszhang requested a review from a team as a code owner May 15, 2019 18:35
@thoszhang
Copy link
Contributor Author

I updated this with some minor fixes/tests:

  • VALIDATE CONSTRAINT was broken: the query would get stuck waiting for the lease if it followed another schema change in the same transaction. I updated the existing logic test to actually catch this case. The fix is to modify the InternalExecutor used for the query to give it an artificial uncommitted table descriptor to use instead, which is what we do for the validation step in ADD CONSTRAINT.
  • The logic tests that verify that the optimizer doesn't use unvalidated FKs for planning have been fixed/restored. (Since unvalidated FKs can't be added, the only way to get unvalidated FKs in logic tests is to use the fact that, currently, FKs added using ALTER TABLE in the same transaction as when the table was created are not validated. So the tests are somewhat awkward.)
  • Got rid of runWithDistSQL in planner since it's no longer used.

@RaduBerinde
Copy link
Member

RaduBerinde commented May 15, 2019

Don't we want to support ADD CONSTRAINT ... NOT VALID (especially since this has been our default behavior)?

@thoszhang
Copy link
Contributor Author

Yeah, we'll want to support NOT VALID eventually. I filed #33551 to track it. Everything should basically be in place to support it - we just have to skip the validation step at the end, so it shouldn't be difficult.

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 16 files at r1, 8 of 8 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @lucy-zhang)


pkg/sql/backfill.go, line 1255 at r2 (raw file):

// cluster version, it will be used in the InternalExecutor that performs the
// validation query.
// validateFkInTxn validates foreign key constraints within the provided

looks like this comment got copy/pasted from below.


pkg/sql/backfill.go, line 1308 at r2 (raw file):

	fkName string,
) error {
	ie := evalCtx.InternalExecutor.(*SessionBoundInternalExecutor)

Are we doing this pattern often enough that it'd make sense to have a scheme-changer executor that DRYs it? (doesn't need to happen here of course)

Copy link
Contributor Author

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt)


pkg/sql/backfill.go, line 1255 at r2 (raw file):

Previously, dt (David Taylor) wrote…

looks like this comment got copy/pasted from below.

Done.


pkg/sql/backfill.go, line 1308 at r2 (raw file):

Previously, dt (David Taylor) wrote…

Are we doing this pattern often enough that it'd make sense to have a scheme-changer executor that DRYs it? (doesn't need to happen here of course)

Yeah, I think so. The existing interface for modifying the TableCollection with tcModifier is also kind of cumbersome - I think all of this could use some refactoring now that we have several different uses for this pattern. The issue for this was intentionally left open: #34304

Currently, validating an unvalidated foreign key constraint requires running
`VALIDATE CONSTRAINT`, which executes a potentially long-running query in the
user transaction. In this PR, `ADD CONSTRAINT` for foreign keys is now
processed by the schema changer, with the validation for existing rows running
in a separate transaction from the original user transaction.

The foreign key mutation (represented by the new `FOREIGN_KEY` enum value in
`ConstraintToUpdate`) is processed in the same way as check constraint
mutations: the foreign key is in the mutations list while other columns and
indexes with the same mutation ID are backfilled, then added to the appropriate
index in the `Validating` state before being validated, and is finalized when
the validation query for existing rows completes successfully. If validation
fails, the transaction is rolled back, with the foreign key (and backreference)
removed from the table descriptor(s) as part of the rollback.

Adding foreign keys to columns or indexes being added is still not supported
and is left for later work. Also unsupported is adding a foreign key constraint
in the same transaction as `CREATE TABLE` that is either validated or that
rolls back the entire transaction on failure. In this PR, the constraint is
just left unvalidated; This needs a follow-up PR to queue a separate mutation
for validating the constraint after it's been added.

Release note (sql change): Foreign keys that are added to an existing table
using `ALTER TABLE` will now be validated for existing rows, with improved
performance compared to running `ADD CONSTRAINT` followed by `VALIDATE
CONSTRAINT` previously.
@thoszhang
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request May 16, 2019
37433: sql: async validation of foreign keys in ADD CONSTRAINT r=lucy-zhang a=lucy-zhang

Currently, validating an unvalidated foreign key constraint requires running
`VALIDATE CONSTRAINT`, which executes a potentially long-running query in the
user transaction. In this PR, `ADD CONSTRAINT` for foreign keys is now
processed by the schema changer, with the validation for existing rows running
in a separate transaction from the original user transaction.

The foreign key mutation (represented by the new `FOREIGN_KEY` enum value in
`ConstraintToUpdate`) is processed in the same way as check constraint
mutations: the foreign key is in the mutations list while other columns and
indexes with the same mutation ID are backfilled, then added to the appropriate
index in the `Validating` state before being validated, and is finalized when
the validation query for existing rows completes successfully. If validation
fails, the transaction is rolled back, with the foreign key (and backreference)
removed from the table descriptor(s) as part of the rollback.

Adding foreign keys to columns or indexes being added is still not supported
and is left for later work. Also unsupported is adding a foreign key constraint
in the same transaction as `CREATE TABLE` that is either validated or that
rolls back the entire transaction on failure. In this PR, the constraint is
just left unvalidated; This needs a follow-up PR to queue a separate mutation
for validating the constraint after it's been added.

Release note (sql change): Foreign keys that are added to an existing table
using `ALTER TABLE` will now be validated for existing rows, with improved
performance compared to running `ADD CONSTRAINT` followed by `VALIDATE
CONSTRAINT` previously.

Co-authored-by: Lucy Zhang <[email protected]>
@craig
Copy link
Contributor

craig bot commented May 16, 2019

Build succeeded

@craig craig bot merged commit 3c49a5f into cockroachdb:master May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants