Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: run check constraint mutations and validation by schema changer #32504

Merged
merged 1 commit into from
Jan 10, 2019

Conversation

eriktrinh
Copy link

@eriktrinh eriktrinh commented Nov 20, 2018

This change moves check constraint adds and drops through the schema
changer when the transaction commits, moving the constraint through the
same intermediate states as when index/columns are added or dropped. The
only small differences are:

  • Constraint adds can start in write-only and immediately start being
    enforced if all used columns are in write-only or public.
  • Constraint drops can move immediately from public to absent if they
    have not yet been validated.

Therefore, the check constraint is no longer immediately public when it
is added, but allows data validation of the constraint to be performed
when the constraint is added (and is now the default behaviour).
Constraints can now also be added on columns in the process of being
added.

This change also ensures that there are no data anomalies in either
versions of the schema when dropping a validated check constraint, as
previously the transition moved the constraint from public -> absent.
Writes on the new version were not being checked even though nodes on an
older version expect all rows to conform to the constraint.

Release note (sql change): Check constraint adds by default will
validate table data with the added constraint asynchronously after the
transaction commits.

@eriktrinh eriktrinh requested review from dt, vivekmenezes and a team November 20, 2018 18:14
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Stuff! A couple really minor nits. The schema_changer.go stuff is in code with which I have a little less familiarity but overall approach looks great (though as commented below, I still get confused about how txn semantics work with schema changes that "fail").

Reviewed 19 of 21 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/backfill.go, line 47 at r1 (raw file):

	// checkConstraintBackfillChunkSize is the maximum number of rows
	// processed per chunk during check constraint validation.
	checkConstraintBackfillChunkSize = 800

How did we pick this number?


pkg/sql/backfill/backfill.go, line 111 at r1 (raw file):

	); err != nil {
		return err
	} else {

this really should be allowed -- minimizing the scope in which typed has to be bound is nice and clean -- but sadly the linter doesn't like it since first block ends in return. dumb linter.


pkg/sql/backfill/backfill.go, line 183 at r1 (raw file):

			}
			if val == tree.DBoolFalse {
				return roachpb.Key{}, errors.Errorf("validation of CHECK %q failed on row: %s",

I think there's a proper pgerror code we should be returning -- though I don't know if returning it here will make it back to the client.


pkg/sql/logictest/testdata/logic_test/schema_change_in_txn, line 611 at r1 (raw file):


statement ok
ALTER TABLE forlater ADD CONSTRAINT e_0 CHECK (e > 0)

Is it sort of weird that you get a success reply here, but if you then ran SELECT ... WHERE e < 0 in the txn, you could observe a violation?


pkg/sql/logictest/testdata/logic_test/schema_change_in_txn, line 613 at r1 (raw file):

ALTER TABLE forlater ADD CONSTRAINT e_0 CHECK (e > 0)

statement error pq: validation of CHECK "e > 0" failed on row: k='c', v='d', d=NULL, e=0

So you get an error from commit but the txn does commit, right? just the CHECK doesn't get added? I realize some of our other schema changes have similar behavior, but it really seems like a strange violation of the all-or-nothing aspect of committing a txn. Maybe I just need to get over that expectation though.

@eriktrinh eriktrinh force-pushed the check branch 3 times, most recently from 3fe0f30 to 91eb4a8 Compare December 4, 2018 23:31
Copy link
Author

@eriktrinh eriktrinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/backfill.go, line 47 at r1 (raw file):

Previously, dt (David Taylor) wrote…

How did we pick this number?

I chose a number larger than any of the other backfill chunk sizes because check backfills don't actually write any data. Testing and benchmarking could be done to tune this.


pkg/sql/backfill/backfill.go, line 183 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I think there's a proper pgerror code we should be returning -- though I don't know if returning it here will make it back to the client.

You're right, there exists 23514 aka check_validation data exception. However error codes from the schema changer get replaced with a statement_completion_unknown code.


pkg/sql/logictest/testdata/logic_test/schema_change_in_txn, line 611 at r1 (raw file):

Previously, dt (David Taylor) wrote…

Is it sort of weird that you get a success reply here, but if you then ran SELECT ... WHERE e < 0 in the txn, you could observe a violation?

Yea this is kinda weird, a similar result is observed with unique constraints as well.


pkg/sql/logictest/testdata/logic_test/schema_change_in_txn, line 613 at r1 (raw file):

Previously, dt (David Taylor) wrote…

So you get an error from commit but the txn does commit, right? just the CHECK doesn't get added? I realize some of our other schema changes have similar behavior, but it really seems like a strange violation of the all-or-nothing aspect of committing a txn. Maybe I just need to get over that expectation though.

Right, the txn which added the mutation does commit. The violation is strange but it's communicated to the user through the error code and name.

Copy link
Author

@eriktrinh eriktrinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/sqlbase/structured.proto, line 480 at r2 (raw file):

}

message CheckConstraint {

Not sure if pulling out this message type is safe or not.

@eriktrinh eriktrinh changed the title [WIP] sql: run check constraint adds and validation through schema changer sql: run check constraint mutations and validation by schema changer Dec 17, 2018
@eriktrinh eriktrinh force-pushed the check branch 4 times, most recently from da57c52 to 77a92c4 Compare December 18, 2018 22:11
@eriktrinh
Copy link
Author

Gave this PR another pass: updating the commit message, adding tests, adding a validating state, and also dropping check constraints through the schema changer if the constraint has been validated. PTAL.

This change moves check constraint adds and drops through the schema
changer when the transaction commits, moving the constraint through the
same intermediate states as when index/columns are added or dropped. The
only small differences are:
 - Constraint adds can start in write-only and immediately start being
   enforced if all used columns are in write-only or public.
 - Constraint drops can move immediately from public to absent if they
   have not yet been validated.

Therefore, the check constraint is no longer immediately public when it
is added, but allows data validation of the constraint to be performed
when the constraint is added (and is now the default behaviour).
Constraints can now also be added on columns in the process of being
added.

This change also ensures that there are no data anomalies in either
versions of the schema when dropping a validated check constraint, as
previously the transition moved the constraint from public -> absent.
Writes on the new version were not being checked even though nodes on an
older version expect all rows to conform to the constraint.

Fixes cockroachdb#33593.
See also cockroachdb#33550.

Release note (sql change): Check constraint adds by default will
validate table data with the added constraint asynchronously after the
transaction commits.
Copy link
Contributor

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/sqlbase/structured.go, line 2261 at r4 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

Updated AddCheckMutation to match the implementation for indexes and columns.

Also removed a test in schema_changer_test.go that was testing this special case.

@thoszhang
Copy link
Contributor

bors r+

craig bot pushed a commit that referenced this pull request Jan 10, 2019
32504: sql: run check constraint mutations and validation by schema changer r=lucy-zhang a=eriktrinh

This change moves check constraint adds and drops through the schema
changer when the transaction commits, moving the constraint through the
same intermediate states as when index/columns are added or dropped. The
only small differences are:
 - Constraint adds can start in write-only and immediately start being
   enforced if all used columns are in write-only or public.
 - Constraint drops can move immediately from public to absent if they
   have not yet been validated.

Therefore, the check constraint is no longer immediately public when it
is added, but allows data validation of the constraint to be performed
when the constraint is added (and is now the default behaviour).
Constraints can now also be added on columns in the process of being
added.

This change also ensures that there are no data anomalies in either
versions of the schema when dropping a validated check constraint, as
previously the transition moved the constraint from public -> absent.
Writes on the new version were not being checked even though nodes on an
older version expect all rows to conform to the constraint.

Release note (sql change): Check constraint adds by default will
validate table data with the added constraint asynchronously after the
transaction commits.

Co-authored-by: Erik Trinh <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jan 10, 2019

Build succeeded

@craig craig bot merged commit 1762913 into cockroachdb:master Jan 10, 2019
thoszhang pushed a commit to thoszhang/cockroach that referenced this pull request Jan 28, 2019
Currently, constraints are added in the `Unvalidated` state, and are not
validated for existing rows until ALTER TABLE ... VALIDATE CONSTRAINT is run.
With this change, check constraints will be validated asynchronously after they
are added by default (and similar changes to FKs are to follow). This addresses
the problematic long-running transactions caused by the current implementation
of VALIDATE CONSTRAINT. This PR is a rework of cockroachdb#32504, and has the same tests.

With this change, check constraints will be added to the table descriptor in
the new `Validating` state, visible to CRUD operations, and a mutation is
queued indicating that the constraint is to be validated. During the backfill
step, the constraint is validated for existing rows. If validation succeeds,
then the constraint moves to the `Validated` state; otherwise, it is dropped.

The behavior when dropping constraints (either via DROP CONSTRAINT or
indirectly when a column is dropped) is unchanged: no mutation is enqueued.

As part of this change, check constraints can be added to non-public columns in
the process of being added, including columns that were created earlier in the
same transaction.

The main difference between this PR and cockroachdb#32504 is that cockroachdb#32504 does not add the
constraint to the table descriptor until it has been validated.

Release note (sql change): Check constraint adds by default will validate table
data with the added constraint asynchronously after the transaction commits.
thoszhang pushed a commit to thoszhang/cockroach that referenced this pull request Feb 12, 2019
Currently, constraints are added in the `Unvalidated` state, and are not
validated for existing rows until ALTER TABLE ... VALIDATE CONSTRAINT is run.
With this change, check constraints will be validated asynchronously after they
are added by default (and similar changes to FKs are to follow). This addresses
the problematic long-running transactions caused by the current implementation
of VALIDATE CONSTRAINT. This PR is a rework of cockroachdb#32504, and has the same tests.

With this change, check constraints will be added to the table descriptor in
the new `Validating` state, visible to CRUD operations, and a mutation is
queued indicating that the constraint is to be validated. During the backfill
step, the constraint is validated for existing rows. If validation succeeds,
then the constraint moves to the `Validated` state; otherwise, it is dropped.

The behavior when dropping constraints (either via DROP CONSTRAINT or
indirectly when a column is dropped) is unchanged: no mutation is enqueued.

As part of this change, check constraints can be added to non-public columns in
the process of being added, including columns that were created earlier in the
same transaction.

The main difference between this PR and cockroachdb#32504 is that cockroachdb#32504 does not add the
constraint to the table descriptor until it has been validated.

Release note (sql change): Check constraint adds by default will validate table
data with the added constraint asynchronously after the transaction commits.
thoszhang pushed a commit to thoszhang/cockroach that referenced this pull request Feb 12, 2019
Currently, constraints are added in the `Unvalidated` state, and are not
validated for existing rows until ALTER TABLE ... VALIDATE CONSTRAINT is run.
With this change, check constraints will be validated asynchronously after they
are added by default (and similar changes to FKs are to follow). This addresses
the problematic long-running transactions caused by the current implementation
of VALIDATE CONSTRAINT. This PR is a rework of cockroachdb#32504, and has the same tests.

With this change, check constraints will be added to the table descriptor in
the new `Validating` state, visible to CRUD operations, and a mutation is
queued indicating that the constraint is to be validated. During the backfill
step, the constraint is validated for existing rows. If validation succeeds,
then the constraint moves to the `Validated` state; otherwise, it is dropped.

The behavior when dropping constraints (either via DROP CONSTRAINT or
indirectly when a column is dropped) is unchanged: no mutation is enqueued.

As part of this change, check constraints can be added to non-public columns in
the process of being added, including columns that were created earlier in the
same transaction.

The main difference between this PR and cockroachdb#32504 is that cockroachdb#32504 does not add the
constraint to the table descriptor until it has been validated.

Release note (sql change): Check constraint adds by default will validate table
data with the added constraint asynchronously after the transaction commits.
craig bot pushed a commit that referenced this pull request Feb 12, 2019
34301: sql: validate check constraints with the schema changer r=lucy-zhang a=lucy-zhang

Currently, constraints are added in the `Unvalidated` state, and are not
validated for existing rows until ALTER TABLE ... VALIDATE CONSTRAINT is run.
With this change, check constraints will be validated asynchronously after they
are added by default (and similar changes to FKs are to follow). This addresses
the problematic long-running transactions caused by the current implementation
of VALIDATE CONSTRAINT. This PR is a rework of #32504 and has the same tests.

With this change, check constraints will be added to the table descriptor in
the new `Validating` state, visible to CRUD operations, and a mutation is
queued indicating that the constraint is to be validated. During the backfill
step, the constraint is validated for existing rows. If validation succeeds,
then the constraint moves to the `Validated` state; otherwise, it is dropped.

The behavior when dropping constraints (either via DROP CONSTRAINT or
indirectly when a column is dropped) is unchanged: no mutation is enqueued.

As part of this change, check constraints can be added to non-public columns in
the process of being added, including columns that were created earlier in the
same transaction.

The main difference between this PR and #32504 is that #32504 does not add the
constraint to the table descriptor until it has been validated.

See #34238 for more context.

Release note (sql change): Check constraint adds by default will validate table
data with the added constraint asynchronously after the transaction commits.

34720: rpc: always trace incoming RPCs if tracing is enabled r=andreimatei a=andreimatei

Before this patch, an incoming RPC would not be traced if the caller
wasn't traced. Usually this was inconsequential, as the caller is
generally traced if tracing is enabled in various ways. However, for the
status server (AdminUI calls) this was not true - the caller (the
browser, through a HTTP->gRPC gateway) was never tracing a call.
This patch makes the server create a span regardless of the caller if
tracing is enabled.

Fixes #34310

Release note: None

34798: roachtest: add two more indexes to bulk index creation test r=vivekmenezes a=vivekmenezes

Release note: None

34829: jobs: only include running and very recently finished jobs in SHOW JOBS r=dt a=dt

On a cluster that has been running for a long time or with frequent periodic jobs, SHOW JOBS can output an unbounded, massive wall of text.
This makes it hard to find the jobs you are likely interested in -- those that are running or have very recently finished.

This changes SHOW JOBS to only include running jobs or those that finished in the last 12h.
The full listing of jobs is still available via crdb_internal.jobs.

Release note (general change): SHOW JOBS only returns running and recently finished jobs. Older jobs can still be inspected via the crdb_internal.jobs table.

Co-authored-by: Lucy Zhang <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Vivek Menezes <[email protected]>
Co-authored-by: David Taylor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants