sql/schemachanger: model IndexColumn, avoid index backfill when unneeded #83222

ajwerner · 2022-06-22T20:18:43Z

sql/schemachanger/rel: add support for Neq (!=)

sql/schemachanger: model IndexColumn explicitly
This change normalizes the columns in an index. This lays the foundation to
handle cases where we add a NULL-able new column without a default value. In
these cases, we should just modify the existing index in-place. That work will
come in a follow-up commit. This will also prove valuable for more complex
add column scenarios.

sql/schemachanger: add a special case for simple add column
Before this change, when we added a column, we'd always create a new primary
index and swap into it. In the case where we're adding a nullable column with
no default value or computed expression, we have no reason to do this whole
dance. Indeed, backfilling that new index was a major regression for this
common case when compared to the legacy schema changer.

In this commit, we detect this special case and just add the new column to
the existing primary index.

Release note: None

cockroach-teamcity · 2022-06-22T20:18:50Z

This change is

chengxiong-ruan

LGTM on everything outside of the rel which honestly I forgot how it works (I swear I knew it how it works 3 months ago).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @chengxiong-ruan)

pkg/sql/catalog/descpb/index_fetch.proto line 47 at r1 (raw file):

  message KeyColumn {
    optional Column column = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false];
    optional catalog.catpb.IndexColumn.Direction direction = 2 [(gogoproto.nullable) = false];

I assume that there's no deserialization problem with this field type change because the essential type is actually the same and protobuf use field number to identify a value from disk?

pkg/sql/schemachanger/scexec/scmutationexec/index.go line 346 at r1 (raw file):

		insertName(&id.StoreColumnNames)
	}
	// If this is a composite column, note that. Because we don't en

to be continued (sound like you wanted to say more heh)

ajwerner · 2022-06-23T14:40:08Z

That this last commit works in the rollback case was surprising to me given we don't explicitly remove the column with an operation from the primary index. It turns out that this gets covered by

cockroach/pkg/sql/schemachanger/scexec/scmutationexec/column.go

Line 197 in cbfa501

tbl.RemoveColumnFromFamilyAndPrimaryIndex(col.ID)

I'm not sure I love this.

Xiang-Gu · 2022-06-23T16:25:28Z

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_add_column.go

+	for _, ec := range append(existingColumns, ic) {
+		cloned := protoutil.Clone(ec).(*scpb.IndexColumn)
+		cloned.IndexID = temp.IndexID
+		b.Add(cloned)
+	}


Wow... Now we would have 3 sets of index columns for the old primary index, new primary index, and temp index; Or 5 sets if this newly added column is UNIQUE and hence another secondary index (and its temp index) needs to be created.

The stage graph is going to get BIG!

Xiang-Gu · 2022-06-23T16:25:33Z

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_add_column.go

+func getNextStoredIndexColumnOrdinal(
+	allTargets ElementResultSet, idx *scpb.PrimaryIndex,
+) (ord uint32) {
+	var foundAny bool


nit: just initialize return ord = -1 we can get rid of foundAny and just return ord ++

Xiang-Gu · 2022-06-23T16:25:37Z

pkg/sql/schemachanger/scdecomp/decomp.go

+				TableID:   tbl.GetID(),
+				IndexID:   idx.GetID(),
+				ColumnID:  c,
+				Ordinal:   uint32(i),


okay, the ordinal here is 0-indexing and different KIND of index columns have their ordinal relative in their own KIND

pkg/sql/schemachanger/scexec/scmutationexec/index.go

Xiang-Gu · 2022-06-23T16:25:48Z

pkg/sql/schemachanger/scexec/scmutationexec/index.go

+	// and sort here.
+	id := index.IndexDesc()
+	n := int(op.Ordinal + 1)
+	insertName := func(s *[]string) {


nit: similarly, I suggest renaming those three lambda functions as insertNameTo bc I initially thought we are going to insert the input string array into somewhere else. Then I realize that we're inserting the col name/id/dir into the input string array.

Xiang-Gu · 2022-06-23T16:25:52Z

pkg/sql/schemachanger/scop/mutation.go

+// The column should already exist on the table and so should
+// the index.


Side note: One confusion I start to get as I learned about the new schema changer is "what does it mean to say a column/index exists on a table".

When I'm learning the old schema changer, I learned that if a table descriptor is undergoing a schema change, it will have mutations queued up in them. This means, when using the table descriptor to interpret the rows, we need to consult both the table indexes/columns as well as any queuing mutations. For example, in a table descriptor, if we can find a column 'k', but we also find a drop_column_k as its mutation that is write_and_delete_only, we say "column k does not exist as far as reading is concerned".

Now, with the new schema changer, we further introduce a place to state "status" in elements. For example, when a column is added, its associated elements are marked as "absent" and "toPublic". But I think a lot of operations in the new schema changer world will just treat this column as "already exists" and proceed to operate on their elements accordingly.

Maybe the way I am approaching to understanding this is off and is actually a lot simpler. Can you try to clear it up a bit for me? What's your mental model (or, how do you think to yourself) when something (e.g. a table/index/column) "already exists" and when it "already no longer exists".

Xiang-Gu · 2022-06-23T16:25:56Z

pkg/sql/schemachanger/scplan/internal/opgen/opgen_index_column.go

+			scpb.Status_PUBLIC,
+			to(scpb.Status_ABSENT,
+				revertible(false)),
+		),


Why didn't we need something like scop.RemoveColumnFromIndex here when we need to drop a column? I guess we do need, and we will do it in DROP COLUMN?

Xiang-Gu · 2022-06-23T16:26:06Z

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go


 	registerDepRule(
-		"partitioning set right after temp index existence",
+		"partitioning and columns set right after temp index existence",


set aside IndexParitioning, how is this rule different from the above one where we state "temporary index must exist right before its dependents (really just its index columns) becomes public"?

Xiang-Gu · 2022-06-23T16:26:14Z

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go

+					(*scpb.PrimaryIndex)(nil),
+					(*scpb.SecondaryIndex)(nil),


The rule name says "temp index existence preceded index dependents" but why isn't the from type "scpb.TemporaryIndex"?

Good catch :)

Xiang-Gu · 2022-06-23T16:26:18Z

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go

+					(*scpb.IndexColumn)(nil),
+				),
+				joinOnIndexID(from.el, to.el, "table-id", "index-id"),
+				targetStatus(from.target, scpb.Transient),


Shouldn't this be scpb.Transient_Absent?

I think it is what you mean : http://github.com/cockroachdb/cockroach/blob/8601bedf62a7e0534a99c33087cdafcc6d1015fd/pkg/sql/schemachanger/scpb/constants.go#L41-L41

though good call that it is confusing.

Xiang-Gu

Thank you so much for this @ajwerner -- it will unblock a lot of PRs from me and generally make manipulating index and columns easier in the new schema changer.

I left some nit comments and questions. LGTM overall

Xiang-Gu · 2022-06-23T16:30:38Z

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_add_column.go

+	// expression and no default value, then we can just add it to the
+	// current primary index; there's no need to build a new index as
+	// it would have exactly the same data as the current index.
+	if spec.def == nil && spec.colType.ComputeExpr == nil {


What if we added a column with ON-UPDATE expression and immediately followed by an UPDATE stmt in the same transaction? It's a scary realm where we have DDL stmts mixed up with DML stmts in a transaction that I never thought about before.

postamar

I did a somewhat superficial pass and have only one substantial concern related to the drop-path of the new index-column element.

Reviewed 6 of 106 files at r1, 102 of 102 files at r2, 95 of 98 files at r3, 11 of 11 files at r4, 4 of 4 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, and @Xiang-Gu)

pkg/sql/catalog/descpb/index_fetch.proto line 47 at r1 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

I assume that there's no deserialization problem with this field type change because the essential type is actually the same and protobuf use field number to identify a value from disk?

Yes, that's correct.

pkg/sql/schemachanger/rel/query_lang.go line 44 at r2 (raw file):

//// attribute takes on a value of a different type than the passed value, a
//// contradiction will be found; the entity's attribute must be bound, and it
//// must be bound to a value of the same type as the provided value.

nit: s,////,//,

pkg/sql/schemachanger/scdecomp/decomp.go line 422 at r4 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

okay, the ordinal here is 0-indexing and different KIND of index columns have their ordinal relative in their own KIND

I was also a bit confused by this ordinal numbering scheme. Perhaps rename Ordinal to OrdinalInKind or something?

pkg/sql/schemachanger/scplan/internal/opgen/opgen_index_column.go line 36 at r4 (raw file):

			scpb.Status_PUBLIC,
			to(scpb.Status_ABSENT,
				revertible(false)),

I'm concerned that without a notImplemented op here we may be passing the rollback tests even when we shouldn't be. Also, this may hide some dep & op rules necessary for correct DROP behaviour.

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go line 97 at r4 (raw file):

		})
	registerDepRule(
		"index existence precedes index dependents",

Adjust rule name. Perhaps "index exists right before index column"? Same comment applies elsewhere.

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go line 520 at r4 (raw file):

					index, column, indexColumn, tableID, columnID, indexID,
				),
				index.AttrNeq(screl.SourceIndexID, catid.IndexID(0)),

Would it be worth introducing some kind of shorthand-rule for "ID is not set"? Just an idea. Perhaps premature.

ajwerner

TFTR, RFAL

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @fqazi, @postamar, and @Xiang-Gu)

pkg/sql/catalog/descpb/index_fetch.proto line 47 at r1 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

I assume that there's no deserialization problem with this field type change because the essential type is actually the same and protobuf use field number to identify a value from disk?

Correct, should be the same. Protobufs when serialized are just field numbers.

pkg/sql/schemachanger/rel/query_lang.go line 44 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

nit: s,////,//,

Done.

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_add_column.go line 426 at r4 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

Wow... Now we would have 3 sets of index columns for the old primary index, new primary index, and temp index; Or 5 sets if this newly added column is UNIQUE and hence another secondary index (and its temp index) needs to be created.

The stage graph is going to get BIG!

Indeed. @fqazi is adding partition subzones, so even bigger! We're starting to out-grow our visualization technologies.

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_add_column.go line 433 at r4 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

nit: just initialize return ord = -1 we can get rid of foundAny and just return ord ++

Done.

pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_add_column.go line 357 at r5 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

What if we added a column with ON-UPDATE expression and immediately followed by an UPDATE stmt in the same transaction? It's a scary realm where we have DDL stmts mixed up with DML stmts in a transaction that I never thought about before.

it won't be populated because the column is not writable during the user transaction. I do understand that that may violate the user's expectations, but that doesn't change in this PR nor does it change with the declarative schema changer. It will get the default value.

pkg/sql/schemachanger/scdecomp/decomp.go line 422 at r4 (raw file):

Previously, postamar (Marius Posta) wrote…

I was also a bit confused by this ordinal numbering scheme. Perhaps rename Ordinal to OrdinalInKind or something?

Done.

pkg/sql/schemachanger/scexec/scmutationexec/index.go line 346 at r1 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

to be continued (sound like you wanted to say more heh)

Fixed.

pkg/sql/schemachanger/scexec/scmutationexec/index.go line 315 at r4 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

nit: I think idx is a better name bc when I first read this block, my mind constantly mistakenly think it as some sort of ID

Done.

pkg/sql/schemachanger/scexec/scmutationexec/index.go line 317 at r4 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

nit: similarly, I suggest renaming those three lambda functions as insertNameTo bc I initially thought we are going to insert the input string array into somewhere else. Then I realize that we're inserting the col name/id/dir into the input string array.

Done.

pkg/sql/schemachanger/scop/mutation.go line 547 at r4 (raw file):

"already exists" and when it "already no longer exists".

Here when I say exists in the key-value store somewhere beyond the element itself, generally as part of the descriptor, mutation or otherwise. Our mental models seem mostly aligned.

pkg/sql/schemachanger/scplan/internal/opgen/opgen_index_column.go line 36 at r4 (raw file):

Previously, postamar (Marius Posta) wrote…

I'm concerned that without a notImplemented op here we may be passing the rollback tests even when we shouldn't be. Also, this may hide some dep & op rules necessary for correct DROP behaviour.

I'm instead going to implement this. The top level comment on the PR explains, to some extent, why I didn't need to do this.

pkg/sql/schemachanger/scplan/internal/opgen/opgen_index_column.go line 37 at r4 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

Why didn't we need something like scop.RemoveColumnFromIndex here when we need to drop a column? I guess we do need, and we will do it in DROP COLUMN?

cockroach/pkg/sql/schemachanger/scexec/scmutationexec/column.go

Line 197 in cbfa501

tbl.RemoveColumnFromFamilyAndPrimaryIndex(col.ID)

was the reason. I've now removed this. It required some fresh rules.

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go line 97 at r4 (raw file):

Previously, postamar (Marius Posta) wrote…

Adjust rule name. Perhaps "index exists right before index column"? Same comment applies elsewhere.

Done.

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go line 123 at r4 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Good catch :)

copy-pasta, fixed

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go line 129 at r4 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

I think it is what you mean : http://github.com/cockroachdb/cockroach/blob/8601bedf62a7e0534a99c33087cdafcc6d1015fd/pkg/sql/schemachanger/scpb/constants.go#L41-L41

though good call that it is confusing.

We can rename that value in a different PR if you desire

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go line 137 at r4 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

set aside IndexParitioning, how is this rule different from the above one where we state "temporary index must exist right before its dependents (really just its index columns) becomes public"?

I've reworked the rules. Please do give them a read as they have changed.

pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go line 520 at r4 (raw file):

Previously, postamar (Marius Posta) wrote…

Would it be worth introducing some kind of shorthand-rule for "ID is not set"? Just an idea. Perhaps premature.

Done.

This ends up being useful. Release note: None

This change normalizes the columns in an index. This lays the foundation to handle cases where we add a NULL-able new column without a default value. In these cases, we should just modify the existing index in-place. That work will come in a follow-up commit. This will also prove valuable for more complex add column scenarios. Release note: None

Before this change, when we added a column, we'd always create a new primary index and swap into it. In the case where we're adding a nullable column with no default value or computed expression, we have no reason to do this whole dance. Indeed, backfilling that new index was a major regression for this common case when compared to the legacy schema changer. In this commit, we detect this special case and just add the new column to the existing primary index. Release note: None

ajwerner · 2022-06-28T12:18:35Z

TFTR!

bors r+

ajwerner · 2022-06-28T14:34:14Z

bors r+

?

craig · 2022-06-28T15:41:32Z

Build succeeded:

GitHub CI (Cockroach)

ajwerner force-pushed the ajwerner/index-column branch 3 times, most recently from d13866c to 149f0cc Compare June 23, 2022 02:59

chengxiong-ruan reviewed Jun 23, 2022

View reviewed changes

ajwerner force-pushed the ajwerner/index-column branch 2 times, most recently from be7666d to b9efa9f Compare June 23, 2022 03:59

ajwerner changed the title ~~sql/schemachanger: model IndexColumn explicitly~~ sql/schemachanger: model IndexColumn, avoid index backfill when unneeded Jun 23, 2022

Xiang-Gu reviewed Jun 23, 2022

View reviewed changes

pkg/sql/schemachanger/scexec/scmutationexec/index.go Outdated Show resolved Hide resolved

Xiang-Gu reviewed Jun 23, 2022

View reviewed changes

Xiang-Gu approved these changes Jun 23, 2022

View reviewed changes

Xiang-Gu reviewed Jun 23, 2022

View reviewed changes

postamar reviewed Jun 23, 2022

View reviewed changes

ajwerner force-pushed the ajwerner/index-column branch 5 times, most recently from 7dd07fd to 723e582 Compare June 27, 2022 20:40

ajwerner commented Jun 27, 2022

View reviewed changes

ajwerner marked this pull request as ready for review June 27, 2022 20:41

ajwerner requested a review from a team June 27, 2022 20:41

ajwerner requested review from a team as code owners June 27, 2022 20:41

ajwerner requested a review from a team June 27, 2022 20:41

ajwerner requested a review from a team as a code owner June 27, 2022 20:41

ajwerner force-pushed the ajwerner/index-column branch from 723e582 to b88b8e5 Compare June 27, 2022 21:45

ajwerner added 2 commits June 27, 2022 22:46

sql/schemachanger/rel: add support for Neq (!=)

d6742ed

This ends up being useful. Release note: None

ajwerner force-pushed the ajwerner/index-column branch from b88b8e5 to 1e229d9 Compare June 28, 2022 03:00

ajwerner requested a review from a team as a code owner June 28, 2022 03:00

ajwerner requested review from livlobo and removed request for a team June 28, 2022 03:00

otan removed the request for review from a team June 28, 2022 03:06

ajwerner force-pushed the ajwerner/index-column branch from 1e229d9 to 0c3f215 Compare June 28, 2022 04:06

postamar approved these changes Jun 28, 2022

View reviewed changes

livlobo removed their request for review June 28, 2022 13:51

craig bot merged commit e19d98d into cockroachdb:master Jun 28, 2022

ajwerner mentioned this pull request Jun 30, 2022

roachtest: cdc/schemareg failed #81421

Closed

Xiang-Gu mentioned this pull request Jul 4, 2022

sql/schemachanger: cleaned some dep rules in declarative schema changer #82907

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/schemachanger: model IndexColumn, avoid index backfill when unneeded #83222

sql/schemachanger: model IndexColumn, avoid index backfill when unneeded #83222

ajwerner commented Jun 22, 2022 •

edited

Loading

cockroach-teamcity commented Jun 22, 2022

chengxiong-ruan left a comment

ajwerner commented Jun 23, 2022

Xiang-Gu Jun 23, 2022

Xiang-Gu Jun 23, 2022

Xiang-Gu Jun 23, 2022

Xiang-Gu Jun 23, 2022

Xiang-Gu Jun 23, 2022

Xiang-Gu Jun 23, 2022

Xiang-Gu Jun 23, 2022

Xiang-Gu Jun 23, 2022

chengxiong-ruan Jun 23, 2022

Xiang-Gu Jun 23, 2022

chengxiong-ruan Jun 23, 2022

Xiang-Gu left a comment

Xiang-Gu Jun 23, 2022

postamar left a comment

ajwerner left a comment

ajwerner commented Jun 28, 2022

ajwerner commented Jun 28, 2022

craig bot commented Jun 28, 2022

		// The column should already exist on the table and so should
		// the index.

sql/schemachanger: model IndexColumn, avoid index backfill when unneeded #83222

sql/schemachanger: model IndexColumn, avoid index backfill when unneeded #83222

Conversation

ajwerner commented Jun 22, 2022 • edited Loading

cockroach-teamcity commented Jun 22, 2022

chengxiong-ruan left a comment

Choose a reason for hiding this comment

ajwerner commented Jun 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xiang-Gu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

postamar left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner commented Jun 28, 2022

ajwerner commented Jun 28, 2022

craig bot commented Jun 28, 2022

ajwerner commented Jun 22, 2022 •

edited

Loading