-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/schemachanger: model IndexColumn, avoid index backfill when unneeded #83222
Conversation
d13866c
to
149f0cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on everything outside of the rel
which honestly I forgot how it works (I swear I knew it how it works 3 months ago).
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @chengxiong-ruan)
pkg/sql/catalog/descpb/index_fetch.proto
line 47 at r1 (raw file):
message KeyColumn { optional Column column = 1 [(gogoproto.embed) = true, (gogoproto.nullable) = false]; optional catalog.catpb.IndexColumn.Direction direction = 2 [(gogoproto.nullable) = false];
I assume that there's no deserialization problem with this field type change because the essential type is actually the same and protobuf use field number to identify a value from disk?
pkg/sql/schemachanger/scexec/scmutationexec/index.go
line 346 at r1 (raw file):
insertName(&id.StoreColumnNames) } // If this is a composite column, note that. Because we don't en
to be continued (sound like you wanted to say more heh)
be7666d
to
b9efa9f
Compare
That this last commit works in the rollback case was surprising to me given we don't explicitly remove the column with an operation from the primary index. It turns out that this gets covered by
I'm not sure I love this. |
for _, ec := range append(existingColumns, ic) { | ||
cloned := protoutil.Clone(ec).(*scpb.IndexColumn) | ||
cloned.IndexID = temp.IndexID | ||
b.Add(cloned) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow... Now we would have 3 sets of index columns for the old primary index, new primary index, and temp index; Or 5 sets if this newly added column is UNIQUE and hence another secondary index (and its temp index) needs to be created.
The stage graph is going to get BIG!
func getNextStoredIndexColumnOrdinal( | ||
allTargets ElementResultSet, idx *scpb.PrimaryIndex, | ||
) (ord uint32) { | ||
var foundAny bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just initialize return ord = -1
we can get rid of foundAny
and just return ord ++
TableID: tbl.GetID(), | ||
IndexID: idx.GetID(), | ||
ColumnID: c, | ||
Ordinal: uint32(i), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, the ordinal
here is 0-indexing and different KIND of index columns have their ordinal
relative in their own KIND
// and sort here. | ||
id := index.IndexDesc() | ||
n := int(op.Ordinal + 1) | ||
insertName := func(s *[]string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: similarly, I suggest renaming those three lambda functions as insertNameTo
bc I initially thought we are going to insert the input string array into somewhere else. Then I realize that we're inserting the col name/id/dir into the input string array.
// The column should already exist on the table and so should | ||
// the index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: One confusion I start to get as I learned about the new schema changer is "what does it mean to say a column/index exists on a table".
When I'm learning the old schema changer, I learned that if a table descriptor is undergoing a schema change, it will have mutations queued up in them. This means, when using the table descriptor to interpret the rows, we need to consult both the table indexes/columns as well as any queuing mutations. For example, in a table descriptor, if we can find a column 'k', but we also find a drop_column_k as its mutation that is write_and_delete_only, we say "column k does not exist as far as reading is concerned".
Now, with the new schema changer, we further introduce a place to state "status" in elements. For example, when a column is added, its associated elements are marked as "absent" and "toPublic". But I think a lot of operations in the new schema changer world will just treat this column as "already exists" and proceed to operate on their elements accordingly.
Maybe the way I am approaching to understanding this is off and is actually a lot simpler. Can you try to clear it up a bit for me? What's your mental model (or, how do you think to yourself) when something (e.g. a table/index/column) "already exists" and when it "already no longer exists".
scpb.Status_PUBLIC, | ||
to(scpb.Status_ABSENT, | ||
revertible(false)), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't we need something like scop.RemoveColumnFromIndex
here when we need to drop a column? I guess we do need, and we will do it in DROP COLUMN
?
|
||
registerDepRule( | ||
"partitioning set right after temp index existence", | ||
"partitioning and columns set right after temp index existence", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set aside IndexParitioning, how is this rule different from the above one where we state "temporary index must exist right before its dependents (really just its index columns) becomes public"?
(*scpb.PrimaryIndex)(nil), | ||
(*scpb.SecondaryIndex)(nil), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rule name says "temp index existence preceded index dependents" but why isn't the from type "scpb.TemporaryIndex"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch :)
(*scpb.IndexColumn)(nil), | ||
), | ||
joinOnIndexID(from.el, to.el, "table-id", "index-id"), | ||
targetStatus(from.target, scpb.Transient), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be scpb.Transient_Absent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is what you mean : http://github.com/cockroachdb/cockroach/blob/8601bedf62a7e0534a99c33087cdafcc6d1015fd/pkg/sql/schemachanger/scpb/constants.go#L41-L41
though good call that it is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this @ajwerner -- it will unblock a lot of PRs from me and generally make manipulating index and columns easier in the new schema changer.
I left some nit comments and questions. LGTM overall
// expression and no default value, then we can just add it to the | ||
// current primary index; there's no need to build a new index as | ||
// it would have exactly the same data as the current index. | ||
if spec.def == nil && spec.colType.ComputeExpr == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we added a column with ON-UPDATE expression and immediately followed by an UPDATE
stmt in the same transaction? It's a scary realm where we have DDL stmts mixed up with DML stmts in a transaction that I never thought about before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a somewhat superficial pass and have only one substantial concern related to the drop-path of the new index-column element.
Reviewed 6 of 106 files at r1, 102 of 102 files at r2, 95 of 98 files at r3, 11 of 11 files at r4, 4 of 4 files at r5, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, and @Xiang-Gu)
pkg/sql/catalog/descpb/index_fetch.proto
line 47 at r1 (raw file):
Previously, chengxiong-ruan (Chengxiong Ruan) wrote…
I assume that there's no deserialization problem with this field type change because the essential type is actually the same and protobuf use field number to identify a value from disk?
Yes, that's correct.
pkg/sql/schemachanger/rel/query_lang.go
line 44 at r2 (raw file):
//// attribute takes on a value of a different type than the passed value, a //// contradiction will be found; the entity's attribute must be bound, and it //// must be bound to a value of the same type as the provided value.
nit: s,////,//,
pkg/sql/schemachanger/scdecomp/decomp.go
line 422 at r4 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
okay, the
ordinal
here is 0-indexing and different KIND of index columns have theirordinal
relative in their own KIND
I was also a bit confused by this ordinal numbering scheme. Perhaps rename Ordinal
to OrdinalInKind
or something?
pkg/sql/schemachanger/scplan/internal/opgen/opgen_index_column.go
line 36 at r4 (raw file):
scpb.Status_PUBLIC, to(scpb.Status_ABSENT, revertible(false)),
I'm concerned that without a notImplemented
op here we may be passing the rollback tests even when we shouldn't be. Also, this may hide some dep & op rules necessary for correct DROP behaviour.
pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go
line 97 at r4 (raw file):
}) registerDepRule( "index existence precedes index dependents",
Adjust rule name. Perhaps "index exists right before index column"? Same comment applies elsewhere.
pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go
line 520 at r4 (raw file):
index, column, indexColumn, tableID, columnID, indexID, ), index.AttrNeq(screl.SourceIndexID, catid.IndexID(0)),
Would it be worth introducing some kind of shorthand-rule for "ID is not set"? Just an idea. Perhaps premature.
7dd07fd
to
723e582
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR, RFAL
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @fqazi, @postamar, and @Xiang-Gu)
pkg/sql/catalog/descpb/index_fetch.proto
line 47 at r1 (raw file):
Previously, chengxiong-ruan (Chengxiong Ruan) wrote…
I assume that there's no deserialization problem with this field type change because the essential type is actually the same and protobuf use field number to identify a value from disk?
Correct, should be the same. Protobufs when serialized are just field numbers.
pkg/sql/schemachanger/rel/query_lang.go
line 44 at r2 (raw file):
Previously, postamar (Marius Posta) wrote…
nit: s,////,//,
Done.
pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_add_column.go
line 426 at r4 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
Wow... Now we would have 3 sets of index columns for the old primary index, new primary index, and temp index; Or 5 sets if this newly added column is UNIQUE and hence another secondary index (and its temp index) needs to be created.
The stage graph is going to get BIG!
Indeed. @fqazi is adding partition subzones, so even bigger! We're starting to out-grow our visualization technologies.
pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_add_column.go
line 433 at r4 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
nit: just initialize return
ord = -1
we can get rid offoundAny
and justreturn ord ++
Done.
pkg/sql/schemachanger/scbuild/internal/scbuildstmt/alter_table_add_column.go
line 357 at r5 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
What if we added a column with ON-UPDATE expression and immediately followed by an
UPDATE
stmt in the same transaction? It's a scary realm where we have DDL stmts mixed up with DML stmts in a transaction that I never thought about before.
it won't be populated because the column is not writable during the user transaction. I do understand that that may violate the user's expectations, but that doesn't change in this PR nor does it change with the declarative schema changer. It will get the default value.
pkg/sql/schemachanger/scdecomp/decomp.go
line 422 at r4 (raw file):
Previously, postamar (Marius Posta) wrote…
I was also a bit confused by this ordinal numbering scheme. Perhaps rename
Ordinal
toOrdinalInKind
or something?
Done.
pkg/sql/schemachanger/scexec/scmutationexec/index.go
line 346 at r1 (raw file):
Previously, chengxiong-ruan (Chengxiong Ruan) wrote…
to be continued (sound like you wanted to say more heh)
Fixed.
pkg/sql/schemachanger/scexec/scmutationexec/index.go
line 315 at r4 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
nit: I think
idx
is a better name bc when I first read this block, my mind constantly mistakenly think it as some sort of ID
Done.
pkg/sql/schemachanger/scexec/scmutationexec/index.go
line 317 at r4 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
nit: similarly, I suggest renaming those three lambda functions as
insertNameTo
bc I initially thought we are going to insert the input string array into somewhere else. Then I realize that we're inserting the col name/id/dir into the input string array.
Done.
pkg/sql/schemachanger/scop/mutation.go
line 547 at r4 (raw file):
"already exists" and when it "already no longer exists".
Here when I say exists in the key-value store somewhere beyond the element itself, generally as part of the descriptor, mutation or otherwise. Our mental models seem mostly aligned.
pkg/sql/schemachanger/scplan/internal/opgen/opgen_index_column.go
line 36 at r4 (raw file):
Previously, postamar (Marius Posta) wrote…
I'm concerned that without a
notImplemented
op here we may be passing the rollback tests even when we shouldn't be. Also, this may hide some dep & op rules necessary for correct DROP behaviour.
I'm instead going to implement this. The top level comment on the PR explains, to some extent, why I didn't need to do this.
pkg/sql/schemachanger/scplan/internal/opgen/opgen_index_column.go
line 37 at r4 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
Why didn't we need something like
scop.RemoveColumnFromIndex
here when we need to drop a column? I guess we do need, and we will do it inDROP COLUMN
?
tbl.RemoveColumnFromFamilyAndPrimaryIndex(col.ID) |
pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go
line 97 at r4 (raw file):
Previously, postamar (Marius Posta) wrote…
Adjust rule name. Perhaps "index exists right before index column"? Same comment applies elsewhere.
Done.
pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go
line 123 at r4 (raw file):
Previously, chengxiong-ruan (Chengxiong Ruan) wrote…
Good catch :)
copy-pasta, fixed
pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go
line 129 at r4 (raw file):
Previously, chengxiong-ruan (Chengxiong Ruan) wrote…
I think it is what you mean : http://github.com/cockroachdb/cockroach/blob/8601bedf62a7e0534a99c33087cdafcc6d1015fd/pkg/sql/schemachanger/scpb/constants.go#L41-L41
though good call that it is confusing.
We can rename that value in a different PR if you desire
pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go
line 137 at r4 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
set aside IndexParitioning, how is this rule different from the above one where we state "temporary index must exist right before its dependents (really just its index columns) becomes public"?
I've reworked the rules. Please do give them a read as they have changed.
pkg/sql/schemachanger/scplan/internal/rules/dep_index_and_column.go
line 520 at r4 (raw file):
Previously, postamar (Marius Posta) wrote…
Would it be worth introducing some kind of shorthand-rule for "ID is not set"? Just an idea. Perhaps premature.
Done.
723e582
to
b88b8e5
Compare
This ends up being useful. Release note: None
This change normalizes the columns in an index. This lays the foundation to handle cases where we add a NULL-able new column without a default value. In these cases, we should just modify the existing index in-place. That work will come in a follow-up commit. This will also prove valuable for more complex add column scenarios. Release note: None
b88b8e5
to
1e229d9
Compare
Before this change, when we added a column, we'd always create a new primary index and swap into it. In the case where we're adding a nullable column with no default value or computed expression, we have no reason to do this whole dance. Indeed, backfilling that new index was a major regression for this common case when compared to the legacy schema changer. In this commit, we detect this special case and just add the new column to the existing primary index. Release note: None
1e229d9
to
0c3f215
Compare
TFTR! bors r+ |
bors r+ ? |
Build succeeded: |
sql/schemachanger/rel: add support for Neq (!=)
sql/schemachanger: model IndexColumn explicitly
This change normalizes the columns in an index. This lays the foundation to
handle cases where we add a NULL-able new column without a default value. In
these cases, we should just modify the existing index in-place. That work will
come in a follow-up commit. This will also prove valuable for more complex
add column scenarios.
sql/schemachanger: add a special case for simple add column
Before this change, when we added a column, we'd always create a new primary
index and swap into it. In the case where we're adding a nullable column with
no default value or computed expression, we have no reason to do this whole
dance. Indeed, backfilling that new index was a major regression for this
common case when compared to the legacy schema changer.
In this commit, we detect this special case and just add the new column to
the existing primary index.
Release note: None