Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: added Hidden to index descriptor and SHOW INDEX #83388

Closed

Conversation

wenyihu6
Copy link
Contributor

@wenyihu6 wenyihu6 commented Jun 26, 2022

This PR takes the first step for the invisible index feature. It added a boolean
field Hidden to the struct IndexDescriptor. Since primary indexes cannot be
hidden, this pr also added a check in pkg/sql/catalog/tabledesc/validate.go
for that. In addition, this PR also added a new column visible to
crdb_internal.table_indexes and also to the output of following SQL statements:

SHOW INDEX FROM (table_name)
SHOW INDEXES FROM(table_name)
SHOW KEYS FROM (table_name)
SHOW INDEX FROM DATABASE(database_name)
SHOW INDEXES FROM DATABASE (database_name)
SHOW KEYS FROM DATABASE (database_name)

Since the invisible index feature has not been introduced yet, all indexes
created should be visible. It is expected for all test cases to output true
for all visible columns.

See also: next PR for invisible index feature: #83471

Assists: #72576
Note that this issue has not been resolved yet, and this pr only takes the first step.

Release note (sql change): A new column visible has been added to the table
crdb_internal.table_indexes and to the output of SQL statements related to
SHOW INDEX, SHOW INDEXES, and SHOW KEYS. The visible column indicates
whether the index is visible to the optimizer. An invisible index is an index
that is up-to-date but is ignored by the optimizer unless explicitly specified
with index hinting.

@wenyihu6 wenyihu6 requested review from a team June 26, 2022 17:06
@wenyihu6 wenyihu6 requested a review from a team as a code owner June 26, 2022 17:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@wenyihu6 wenyihu6 force-pushed the 1-add-invisible-to-descpb branch 6 times, most recently from f4240b4 to b659287 Compare June 27, 2022 21:32
@wenyihu6 wenyihu6 removed request for a team June 28, 2022 01:47
Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing for schema, most of my comments are about the names of things, other than that this looks like a reasonable change, done very thoroughly. Thank you for that.

SELECT * FROM crdb_internal.table_indexes WHERE descriptor_name = ''
----
descriptor_id descriptor_name index_id index_name index_type is_unique is_inverted is_sharded shard_bucket_count created_at
descriptor_id descriptor_name index_id index_name index_type is_unique is_inverted is_sharded is_invisible shard_bucket_count created_at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a concept of hidden columns, which is somewhat similar to what you're trying to introduce for indexes. I feel it would make sense to re-use the existing terminology for the sake of coherence. This renaming should be straightforward. In this instance, consider renaming is_invisible to is_hidden.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted a few other comments in the same vein, but they aren't exhaustive.

@@ -183,6 +183,7 @@ func (p *planner) AlterPrimaryKey(
Version: descpb.StrictIndexColumnIDGuaranteesVersion,
ConstraintID: tableDesc.GetNextConstraintID(),
CreatedAtNanos: p.EvalContext().GetTxnTimestamp(time.Microsecond).UnixNano(),
Invisible: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming consistency: consider renaming to Hidden.

@@ -280,6 +280,7 @@ func (n *alterTableNode) startExec(params runParams) error {
idx := descpb.IndexDescriptor{
Name: string(d.Name),
Unique: true,
Invisible: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming consistency: consider renaming the field in the index descriptor proto to hidden.

@@ -174,6 +174,10 @@ func indexForDisplay(
}
}

if index.Invisible {
f.WriteString(" INVISIBLE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming consistency: consider replacing the INVISIBLE SQL keyword with NOT VISIBLE.

// IsInvisible returns whether the index is invisible or not.
func (desc *IndexDescriptor) IsInvisible() bool {
return desc.Invisible
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method necessary? This isn't super obvious but we try to keep the generated protobuf structs lightweight and instead move as much business logic as possible to the catalog.Index interface and its implementation.

NextColumnID: 2,
NextFamilyID: 1,
NextIndexID: 2,
}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this test 👍

Descriptor validation helps us prevent bugs which otherwise are time-consuming to fix, by catching inconsistencies before they are committed to the cluster.

non_unique::BOOL,
seq_in_index,
column_name,
direction,
storing::BOOL,
implicit::BOOL`
implicit::BOOL,
idx.is_invisible::BOOL`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider idx.is_invisible::BOOL AS hidden instead to keep the column names consistent, not just with the invisible -> hidden rename I'm advocating, but also for consistency within this table. The other boolean columns don't have an is_ prefix here.

@@ -111,7 +113,8 @@ SELECT
column_name,
direction,
storing::BOOL,
implicit::BOOL`
implicit::BOOL,
idx.is_invisible::BOOL`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment: idx.is_invisible::BOOL AS hidden

@@ -56,6 +56,9 @@ type Index interface {
// IsInverted returns true if this is an inverted index.
IsInverted() bool

// IsInvisible returns true if this is an invisible index.
IsInvisible() bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming consistency: consider renaming to IsHidden().


// is_invisible specifies whether this index is invisible. Note that primary
// index cannot be invisible.
bool is_invisible = 23;
Copy link
Contributor

@postamar postamar Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to suggesting to renaming this for consistency, you'll need to update the logic in scdecomp to populate this field based on what it finds in the index descriptor. It should be straightforward. I bring it up because I just saw some that some of the scdecomp tests failed in your latest CI run.

@wenyihu6 wenyihu6 marked this pull request as draft June 28, 2022 14:18
@wenyihu6 wenyihu6 force-pushed the 1-add-invisible-to-descpb branch 6 times, most recently from 6f1c80d to 01eb55d Compare July 6, 2022 03:40
@wenyihu6 wenyihu6 changed the title sql: added invisible to index descriptor and SHOW INDEX sql: added Hidden to index descriptor and SHOW INDEX Jul 10, 2022
@wenyihu6 wenyihu6 force-pushed the 1-add-invisible-to-descpb branch 4 times, most recently from 96b2e62 to 77e6e66 Compare July 11, 2022 20:15
@wenyihu6 wenyihu6 force-pushed the 1-add-invisible-to-descpb branch 2 times, most recently from d51efc7 to 7de4965 Compare July 15, 2022 11:42
@wenyihu6
Copy link
Contributor Author

pkg/ccl/logictestccl/testdata/logic_test/partitioning_implicit line 517 at r6 (raw file):

query TTB
SELECT index_name, column_name, implicit FROM [SHOW INDEXES FROM t]
ORDER BY index_name, seq_in_index

I removed ORDER BY index_name, seq_in_index from some test cases because it's now using ORDER BY table_name, index_name, seq_in_index in SHOW INDEXES. Let me know if you want me to add them back.

@wenyihu6
Copy link
Contributor Author

pkg/ccl/logictestccl/testdata/logic_test/partitioning_implicit line 517 at r6 (raw file):

Previously, wenyihu6 (Wenyi Hu) wrote…

I removed ORDER BY index_name, seq_in_index from some test cases because it's now using ORDER BY table_name, index_name, seq_in_index in SHOW INDEXES. Let me know if you want me to add them back.

Hmm I'm not sure why the changes I made is not showing here in Reviewable. But they are under FilesChanged in my PR.

@mgartner
Copy link
Collaborator

@postamar apologies for re-opening this discussion, but I'm worried that using the word "hidden" for parts of the feature when all of the user-facing interactions with the feature use the term "visible" make this unnecessarily confusing. Do you feel that naming consistent with existing "hidden" features is more important than consistency with the "not visible" feature being added?

@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Jul 15, 2022

@postamar @mgartner @michae2 @knz @vy-ton @otan Tagging people who have commented on the syntax discussion so far. Feel free to ignore me if this is not in your zone : )

It’s been a long time since we last talked about the syntax and naming conventions for this feature. Sorry about this! This paragraph summarizes our discussion about this topic so far.

Invisible index feature is introducing three new user facing syntaxes.
When users are creating a new invisible index with CREATE TABLE, CREATE INDEX, or ALTER INDEX, they will need to specify whether the index is invisible. The two options that we have discussed are NOT VISIBLE, INVISIBLE. MySQL and Oracle both support INVISIBLE instead. Invisible column feature is using NOT VISIBLE. But we have decided that being consistent with the invisible column feature is more important.

The second user-facing syntax is related to SQL statements likeSHOW INDEX. The three options are is_hidden, visible, or is_visible. MySQL is using visible. Invisible column feature is using is_hidden. And now we want to decide if it is more important to stay consistent with the invisible column feature and use is_hidden or to stay consistent with the first user-facing syntax and use is_visible or visible.

The third user-facing syntax is with crdb_internal.table_indexes. Invisible column feature is using hidden in table_columns. MySQL uses INFORMATION_SCHEMA.STATISTICS and has is_visible. Oracle uses all_indexes and has visibility.

We are also introducing another field in the index descriptor (just for internal use). The options are hidden, invisible. The invisible column feature is using hidden in column descriptor. If we decide to choose visible for all invisible index syntax mentioned above, we can use invisible here to be more consistent. My slight preference is to use either hidden or invisible (not visible) for the index descriptor since the default behavior for a new index is visible, and the default boolean value is false.

Another thing to note is that MySQL introduces a new column in INFORMATION_SCHEMA.STATISTICS. We are now introducing a new column in crdb_internal.table_indexes, but we can add a new column to INFORMATION_SCHEMA.STATISTICS as well.

@michae2
Copy link
Collaborator

michae2 commented Jul 15, 2022

Naming is hard! 🙂

Invisible index feature is introducing three new user facing syntaxes. When users are creating a new invisible index with CREATE TABLE or with CREATE INDEX, they will need to specify whether the index is invisible. The two options that we have discussed are NOT VISIBLE, INVISIBLE. MySQL and Oracle both support INVISIBLE instead. Invisible column feature is using NOT VISIBLE. But we have decided that being consistent with the invisible column feature is important.

Yes, I agree with the current usage of NOT VISIBLE in DDL statements. It matches what we do for columns and it avoids the problems with making INVISIBLE a keyword in the grammar. It's too bad it doesn't match MySQL and Oracle, but I've accepted that. 😛

The second user-facing syntax is related to SQL statements likeSHOW INDEX. The three options are is_hidden, visible, or is_visible. Oracle is using is_visible. MySQL is using visible. Invisible column feature is using is_hidden. And now we want to decide if it is more important to stay consistent with the invisible column feature and use is_hidden or to stay consistent with the first user-facing syntax and use is_visible or visible.

The SHOW INDEX statements are not standard, and come from MySQL, so I propose we match MySQL and use visible in the output of these statements. (Oracle doesn't actually support SHOW INDEX, that link is also for MySQL.)

The third user-facing syntax is with crdb_internal.table_indexes. Invisible column feature is usingis_hidden for table_indexes. Oracle, MySQL are both consistent with what they have for SHOW INDEX which are is_visible and visible.

I don't feel strongly about this one. I'm fine with is_hidden to match what we do for columns. Or anything else.

We are also introducing another field in the index descriptor (just for internal use). The options are hidden, invisible. The invisible column feature is using hidden in column descriptor. If we decide to choose visible for all invisible index syntax mentioned above, we can use invisible here to be more consistent. My slight preference is to use either hidden or invisible (not visible) for the index descriptor since the default behavior for a new index is visible, and the default boolean value is false.

Again, I don't feel strongly about this one. I'm fine with hidden or invisible. Anything is fine with me.

Another thing to note is that MySQL and Oracle both introduce a new column in INFORMATION_SCHEMA.STATISTICS. We are now introducing a new column in crdb_internal.table_indexes, but we can add a new column to INFORMATION_SCHEMA.STATISTICS as well.

MySQL uses is_visible in the STATISTICS table, as an extension to the standard. (Oracle doesn't actually support INFORMATION_SCHEMA.) Postgres doesn't seem to have information_schema.statistics. Again, I propose we match MySQL.

@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Jul 15, 2022

Naming is hard! 🙂

Invisible index feature is introducing three new user facing syntaxes. When users are creating a new invisible index with CREATE TABLE or with CREATE INDEX, they will need to specify whether the index is invisible. The two options that we have discussed are NOT VISIBLE, INVISIBLE. MySQL and Oracle both support INVISIBLE instead. Invisible column feature is using NOT VISIBLE. But we have decided that being consistent with the invisible column feature is important.

Yes, I agree with the current usage of NOT VISIBLE in DDL statements. It matches what we do for columns and it avoids the problems with making INVISIBLE a keyword in the grammar. It's too bad it doesn't match MySQL and Oracle, but I've accepted that. 😛

The second user-facing syntax is related to SQL statements likeSHOW INDEX. The three options are is_hidden, visible, or is_visible. Oracle is using is_visible. MySQL is using visible. Invisible column feature is using is_hidden. And now we want to decide if it is more important to stay consistent with the invisible column feature and use is_hidden or to stay consistent with the first user-facing syntax and use is_visible. I slightly prefer to use is_visible over visible to be consistent with other columns in SHOW INDEX — is_unique, is_inverted.

The SHOW INDEX statements are not standard, and come from MySQL, so I propose we match MySQL and use visible in the output of these statements. (Oracle doesn't actually support SHOW INDEX, that link is also for MySQL.)

The third user-facing syntax is with crdb_internal.table_indexes. Invisible column feature is usingis_hidden for table_indexes. Oracle, MySQL are both consistent with what they have for SHOW INDEX which are is_visible and visible.

I don't feel strongly about this one. I'm fine with is_hidden to match what we do for columns. Or anything else.

We are also introducing another field in the index descriptor (just for internal use). The options are hidden, invisible. The invisible column feature is using hidden in column descriptor. If we decide to choose visible for all invisible index syntax mentioned above, we can use invisible here to be more consistent. My slight preference is to use either hidden or invisible (not visible) for the index descriptor since the default behavior for a new index is visible, and the default boolean value is false.

Again, I don't feel strongly about this one. I'm fine with hidden or invisible. Anything is fine with me.

Another thing to note is that MySQL and Oracle both introduce a new column in INFORMATION_SCHEMA.STATISTICS. We are now introducing a new column in crdb_internal.table_indexes, but we can add a new column to INFORMATION_SCHEMA.STATISTICS as well.

MySQL uses is_visible in the STATISTICS table, as an extension to the standard. (Oracle doesn't actually support INFORMATION_SCHEMA.) Postgres doesn't seem to have information_schema.statistics. Again, I propose we match MySQL.

I see now. Sorry about the confusion. The link I got for Oracle was actually for MySQL😬. I will correct that paragraph now.

@vy-ton
Copy link
Contributor

vy-ton commented Jul 19, 2022

The two options that we have discussed are NOT VISIBLE, INVISIBLE. MySQL and Oracle both support INVISIBLE instead. Invisible column feature is using NOT VISIBLE. But we have decided that being consistent with the invisible column feature is more important.

Can we support INVISIBLE as an alias? We've done this for other features where other dbs offer different syntax. The value being that an existing migration wouldn't need to change their SQL.

The second user-facing syntax is related to SQL statements like SHOW INDEX.

I prefer visible. I'm confused by I slightly prefer to use is_visible over visible to be consistent with other columns in SHOW INDEX — is_unique, is_inverted. since I see non-unique on SHOW INDEX.

We should help users understand the difference between hidden columns vs indexes, which is why I prefer the different naming for SHOW. For columns, being hidden is a presentation decision. For indexes, being hidden impacts query behavior.

  • MySQL uses INFORMATION_SCHEMA.STATISTICS and has is_visible. - I prefer this. After Postgres syntax/semantics, I would favor mirroring MySQL.
  • Yes, we should add the matching information_schema columns

@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Jul 19, 2022

The two options that we have discussed are NOT VISIBLE, INVISIBLE. MySQL and Oracle both support INVISIBLE instead. Invisible column feature is using NOT VISIBLE. But we have decided that being consistent with the invisible column feature is more important.

Can we support INVISIBLE as an alias? We've done this for other features where other dbs offer different syntax. The value being that an existing migration wouldn't need to change their SQL.

The second user-facing syntax is related to SQL statements like SHOW INDEX.

I prefer visible. I'm confused by I slightly prefer to use is_visible over visible to be consistent with other columns in SHOW INDEX — is_unique, is_inverted. since I see non-unique on SHOW INDEX.

We should help users understand the difference between hidden columns vs indexes, which is why I prefer the different naming for SHOW. For columns, being hidden is a presentation decision. For indexes, being hidden impacts query behavior.

  • MySQL uses INFORMATION_SCHEMA.STATISTICS and has is_visible. - I prefer this. After Postgres syntax/semantics, I would favor mirroring MySQL.
  • Yes, we should add the matching information_schema columns

Thanks for looking into this! We can choose visible for SHOW INDEX and is_visible for both crdb_internal.table_indexes and information_schema. Sorry, I mixed up SHOW INDEX and crdb_internal.table_indexes; I've updated the paragraph. For the alias part, I will try looking into it.

@knz
Copy link
Contributor

knz commented Jul 19, 2022

Can we support INVISIBLE as an alias? We've done this for other features where other dbs offer different syntax. The value being that an existing migration wouldn't need to change their SQL.

I don't think that's a good idea. There are two different arguments, one that's user-facing and one that's technical.

  1. For one, the reason "An existing migration wouldn't need to change their SQL" is not very good. If the user is coming from MySQL or Oracle, they have to rewrite their SQL anyway. So having different keywords is part of the course.

  2. the technical reason is that we should remain consistent on UX. We already support NOT VISIBLE for another feature (hidden columns). If we introduce INVISIBLE as alias for NOT VISIBLE in the new feature, we'd need to do it for hidden columns too for consistency. However, we cannot do this for hidden columns (for a technical reason that has to do with the structure of our grammar). Because we cannot do it for hidden columns, we should not do it here either.

@mgartner
Copy link
Collaborator

I agree with @knz that adding INVISIBLE would only add confusion when users try to create non-visible columns, much like the confusion with FOR and FROM that we are already guilty of creating.

I agree with the others on the columns for introspection tables. Avoiding "invisible" and favoring column names with the word "visible" has the benefit of matching the syntax. We may consider referring to the feature publicly (i.e. in docs) as "index visibility" rather than "invisible indexes" to avoid inconsistency with the syntax.

@vy-ton
Copy link
Contributor

vy-ton commented Jul 19, 2022

For migrations, it would be better UX if users could focus SQL rewrites on unsupported syntax or different db requirements, e.g. primary key design. Regardless, I understand the rationale for no alias. @otan Can you open an issue for creating a schema migration tool rule to convert syntax.

@wenyihu6
Copy link
Contributor Author

wenyihu6 commented Jul 19, 2022

Thanks everyone for the comments and sorry for the email spam!

Conclusion on invisible index syntax discussion: Michael, Vy, and Marcus's opinions on the second and third user-facing syntax are the same. SQL statements related to SHOW INDEX will use visible. crdb_internal.table_indexes will use is_visible. This also has not been disagreed by anyone from our previous discussion; I will use them as the final decision if no one has different opinions. Please let me know if you do : )

For the first user-facing syntax, Vy wants to support INVISIBLE as an alias. I will leave this for discussion later. If we do want to support INVISIBLE as an alias later, I will make another PR in the end after I complete this invisible index feature.

I will also make another PR to add a new column in information_schema.statistics with is_visible after this PR.

@vy-ton
Copy link
Contributor

vy-ton commented Jul 19, 2022

For the first user-facing syntax, Vy wants to support INVISIBLE as an alias. I will leave this for discussion later. If we do want to support INVISIBLE as an alias later, I will make another PR in the end after I complete this invisible index feature.

Sorry I should have clarified that I'm ok with no support for INVISIBLE as alias

@postamar
Copy link
Contributor

@postamar apologies for re-opening this discussion, but I'm worried that using the word "hidden" for parts of the feature when all of the user-facing interactions with the feature use the term "visible" make this unnecessarily confusing. Do you feel that naming consistent with existing "hidden" features is more important than consistency with the "not visible" feature being added?

I don't feel too strongly about that. I'm biased towards maintaining towards existing patterns and addressing the naming inconsistencies already present with columns falls way out of scope of this PR. Furthermore, that's neither urgent nor difficult, so I doesn't worry me.

@wenyihu6 wenyihu6 marked this pull request as draft July 20, 2022 17:25
This PR added a new field `NotVisible` to the struct `IndexDescriptor`. Since
primary indexes cannot be not visible, it added another test in
`pkg/sql/catalog/tabledesc/validate.go`. Since the invisible index feature has
not been introduced yet, all indexes created should be visible.

See also: cockroachdb#83471

Assists: cockroachdb#72576

Release note: none
@wenyihu6 wenyihu6 force-pushed the 1-add-invisible-to-descpb branch from 7de4965 to ad9b92c Compare July 20, 2022 21:12
@wenyihu6 wenyihu6 closed this Jul 20, 2022
@wenyihu6 wenyihu6 deleted the 1-add-invisible-to-descpb branch July 20, 2022 21:13
wenyihu6 added a commit to wenyihu6/cockroach that referenced this pull request Jul 21, 2022
This PR takes the second step for the invisible index feature. The previous PR,
cockroachdb#83388, should be merged before
this PR is merged. The first commit of this pr comes from the previous pr. This
PR added parsing support for CREATE INDEX … NOT VISIBLE and CREATE TABLE (...
INDEX() NOT VISIBLE). Note that this PR does not add any logic to the optimizer,
and executing it returns an “unimplemented” error immediately.

Assists: cockroachdb#72576

See also: cockroachdb#83388

Release note: None
@wenyihu6
Copy link
Contributor Author

Opened a new PR for a cleaner thread without syntax discussion (New pr: #84763, #84776).

wenyihu6 added a commit to wenyihu6/cockroach that referenced this pull request Jul 21, 2022
This PR takes the second step for the invisible index feature. The previous PR,
cockroachdb#83388, should be merged before
this PR is merged. The first commit of this pr comes from the previous pr. This
PR added parsing support for CREATE INDEX … NOT VISIBLE and CREATE TABLE (...
INDEX() NOT VISIBLE). Note that this PR does not add any logic to the optimizer,
and executing it returns an “unimplemented” error immediately.

Assists: cockroachdb#72576

See also: cockroachdb#83388

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants