-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builtins: make pg_get_indexdef handle expression indexes #95413
Conversation
9536a74
to
3c1a57a
Compare
Since this PR is going to close the issue, can we add a test for |
hm, i didn't actually change SHOW INDEXES in this PR, but let me try doing that |
605b573
to
b9ff5a6
Compare
8607dc8
to
2b7f9d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice. Thanks.
I assume the string "N/A" is also what postgres uses. I was surprised about that. It's perhaps useful to highlight in a comment that this is intentional.
I have a few curiosity driven questions below but they need not block this PR.
non_unique::BOOL, | ||
seq_in_index, | ||
column_name, | ||
CASE | ||
WHEN seq_in_index <= 0 OR seq_in_index > array_length(i.indkey, 1) THEN 'N/A' | ||
WHEN i.indkey[seq_in_index-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), seq_in_index)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you perhaps add an explanatory comment with an example. I find this expression hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding
@@ -137,26 +137,29 @@ SELECT | |||
non_unique::BOOL, | |||
seq_in_index, | |||
column_name, | |||
CASE | |||
WHEN seq_in_index <= 0 OR seq_in_index > array_length(i.indkey, 1) THEN 'N/A' | |||
WHEN i.indkey[seq_in_index-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), seq_in_index)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding
@@ -2259,7 +2259,7 @@ b | |||
query T | |||
SELECT pg_catalog.pg_get_indexdef((SELECT oid from pg_class WHERE relname='pg_indexdef_cols_idx'), 3, false) | |||
---- | |||
rowid | |||
· |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what I'm seeing here. Could you explain in a review comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
is what the test uses to represent the empty string. showing rowid
in the result here was actually a bug. I've added that to the release note, and left a comment on the test to explain what is being tested.
1 id | ||
2 (json->>'bar'::STRING) | ||
3 (length(id)) | ||
4 · |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again i don't understand the 4th row
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
is the empty string. pg_get_indexdef is supposed to return the empty string for any column index that is not an index key. (confusing that the word index is overloaded in that sentence)
Body: `SELECT CASE | ||
WHEN $2 = 0 THEN defs.indexdef | ||
WHEN $2 < 0 OR $2 > array_length(i.indkey, 1) THEN '' | ||
WHEN i.indkey[$2-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), $2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like before this would benefit from an explanatory comment nearby.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition. I just have a couple non-blocking comments.
Reviewed 5 of 5 files at r1, 2 of 2 files at r2, 23 of 23 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @Xiang-Gu, and @ZhouXing19)
pkg/sql/delegate/show_database_indexes.go
line 59 at r3 (raw file):
JOIN %[1]s.pg_catalog.pg_class c_table ON c_table.relname = s.table_name JOIN %[1]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND c_table.relnamespace = n.oid AND n.nspname = s.index_schema JOIN %[1]s.pg_catalog.pg_index i ON i.indexrelid = c.oid AND i.indrelid = c_table.oid
It seems like there's an unnecessary join to pg_class here. It could be rewritten as:
FROM
%[1]s.information_schema.statistics AS s
JOIN %[1]s.pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
JOIN %[1]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
JOIN %[1]s.pg_catalog.pg_index i ON i.indexrelid = c.oid
With this rewrite and a test case I tried with 9 indexes, average runtime dropped from about 50 ms to 35 ms.
pkg/sql/delegate/show_table.go
line 161 at r3 (raw file):
JOIN %[4]s.pg_catalog.pg_class c_table ON c_table.relname = s.table_name JOIN %[4]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND c_table.relnamespace = n.oid AND n.nspname = s.index_schema JOIN %[4]s.pg_catalog.pg_index i ON i.indexrelid = c.oid AND i.indrelid = c_table.oid
Same as previous comment. The extra join to pg_class could be removed.
Release note (bug fix): The pg_get_indexdef was fixed so that it shows the expression used to define an expression-based index. In addition, the function was previously including columns stored by the index, which was incorrect and has now been fixed.
Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @Xiang-Gu, and @ZhouXing19)
pkg/sql/delegate/show_database_indexes.go
line 59 at r3 (raw file):
Previously, msirek (Mark Sirek) wrote…
It seems like there's an unnecessary join to pg_class here. It could be rewritten as:
FROM %[1]s.information_schema.statistics AS s JOIN %[1]s.pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name JOIN %[1]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema JOIN %[1]s.pg_catalog.pg_index i ON i.indexrelid = c.oid
With this rewrite and a test case I tried with 9 indexes, average runtime dropped from about 50 ms to 35 ms.
those don't seem equivalent. it can lead to double-counting if there are two tables that have indexes with the same names. example:
root@localhost:26257/defaultdb> create table t (a int, b int, index idx (b));
CREATE TABLE
root@localhost:26257/defaultdb> create table t2 (a int, b int, index idx (a));
CREATE TABLE
root@localhost:26257/defaultdb> select relname, oid from pg_class where relname = 'idx';
relname | oid
----------+-------------
idx | 737994182
idx | 3428148196
root@localhost:26257/defaultdb> SELECT
-> table_name,
-> index_name,
-> index_schema,
-> non_unique::BOOL,
-> seq_in_index,
-> column_name,
-> direction,
-> storing::BOOL,
-> implicit::BOOL,
-> is_visible::BOOL AS visible
-> FROM
-> information_schema.statistics AS s
-> JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
-> JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
-> JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
-> WHERE
-> table_catalog='defaultdb'
-> AND table_schema='public'
-> AND table_name='t'
-> ORDER BY
-> 1, 2, 4;
table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
t | idx | public | t | 1 | b | ASC | f | f | t
t | idx | public | t | 2 | rowid | ASC | f | t | t
t | idx | public | t | 1 | b | ASC | f | f | t
t | idx | public | t | 2 | rowid | ASC | f | t | t
t | t_pkey | public | f | 1 | rowid | ASC | f | f | t
t | t_pkey | public | f | 3 | b | N/A | t | f | t
t | t_pkey | public | f | 2 | a | N/A | t | f | t
root@localhost:26257/defaultdb> drop table t2;
DROP TABLE
-- after dropping t2, there's no more double counting
root@localhost:26257/defaultdb> SELECT
-> table_name,
-> index_name,
-> index_schema,
-> non_unique::BOOL,
-> seq_in_index,
-> column_name,
-> direction,
-> storing::BOOL,
-> implicit::BOOL,
-> is_visible::BOOL AS visible
-> FROM
-> information_schema.statistics AS s
-> JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
-> JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
-> JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
-> WHERE
-> table_catalog='defaultdb'
-> AND table_schema='public'
-> AND table_name='t'
-> ORDER BY
-> 1, 2, 4;
table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
t | idx | public | t | 1 | b | ASC | f | f | t
t | idx | public | t | 2 | rowid | ASC | f | t | t
t | t_pkey | public | f | 1 | rowid | ASC | f | f | t
t | t_pkey | public | f | 3 | b | N/A | t | f | t
t | t_pkey | public | f | 2 | a | N/A | t | f | t
i'm open to hearing other ways of avoiding a join though. i added it because i needed it to properly filter the join to pg_index (ON i.indexrelid = c.oid AND i.indrelid = c_table.oid
).
I used "N/A" since that's what cockroach/pkg/sql/information_schema.go Line 1234 in 0775fcc
Neither
So now I think it would be nicer if we also use the term "definition" and just show the column name if there is no expression, like what |
2b7f9d6
to
85dfba8
Compare
Indeed my question about N/A was driven by my work on #88061 so we get an unsurprising output on \d |
Release note (sql change): SHOW INDEXES will now show the expression used to define an index, if one was used.
85dfba8
to
b80c3d7
Compare
tftr! bors r=knz |
Build succeeded: |
Awesome!!! Thank you for dealing with this @rafiss :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/delegate/show_database_indexes.go
line 59 at r3 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
those don't seem equivalent. it can lead to double-counting if there are two tables that have indexes with the same names. example:
root@localhost:26257/defaultdb> create table t (a int, b int, index idx (b)); CREATE TABLE root@localhost:26257/defaultdb> create table t2 (a int, b int, index idx (a)); CREATE TABLE root@localhost:26257/defaultdb> select relname, oid from pg_class where relname = 'idx'; relname | oid ----------+------------- idx | 737994182 idx | 3428148196 root@localhost:26257/defaultdb> SELECT -> table_name, -> index_name, -> index_schema, -> non_unique::BOOL, -> seq_in_index, -> column_name, -> direction, -> storing::BOOL, -> implicit::BOOL, -> is_visible::BOOL AS visible -> FROM -> information_schema.statistics AS s -> JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name -> JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema -> JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid -> WHERE -> table_catalog='defaultdb' -> AND table_schema='public' -> AND table_name='t' -> ORDER BY -> 1, 2, 4; table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible -------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+---------- t | idx | public | t | 1 | b | ASC | f | f | t t | idx | public | t | 2 | rowid | ASC | f | t | t t | idx | public | t | 1 | b | ASC | f | f | t t | idx | public | t | 2 | rowid | ASC | f | t | t t | t_pkey | public | f | 1 | rowid | ASC | f | f | t t | t_pkey | public | f | 3 | b | N/A | t | f | t t | t_pkey | public | f | 2 | a | N/A | t | f | t root@localhost:26257/defaultdb> drop table t2; DROP TABLE -- after dropping t2, there's no more double counting root@localhost:26257/defaultdb> SELECT -> table_name, -> index_name, -> index_schema, -> non_unique::BOOL, -> seq_in_index, -> column_name, -> direction, -> storing::BOOL, -> implicit::BOOL, -> is_visible::BOOL AS visible -> FROM -> information_schema.statistics AS s -> JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name -> JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema -> JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid -> WHERE -> table_catalog='defaultdb' -> AND table_schema='public' -> AND table_name='t' -> ORDER BY -> 1, 2, 4; table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible -------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+---------- t | idx | public | t | 1 | b | ASC | f | f | t t | idx | public | t | 2 | rowid | ASC | f | t | t t | t_pkey | public | f | 1 | rowid | ASC | f | f | t t | t_pkey | public | f | 3 | b | N/A | t | f | t t | t_pkey | public | f | 2 | a | N/A | t | f | t
i'm open to hearing other ways of avoiding a join though. i added it because i needed it to properly filter the join to pg_index (
ON i.indexrelid = c.oid AND i.indrelid = c_table.oid
).
I see. Maybe an EXISTS clause could be used to prevent duplicates:
SELECT
table_name,
index_name,
index_schema,
non_unique::BOOL,
seq_in_index,
column_name,
direction,
storing::BOOL,
implicit::BOOL,
is_visible::BOOL AS visible
FROM
information_schema.statistics AS s
JOIN pg_catalog.pg_namespace n ON n.nspname = s.index_schema
WHERE
table_catalog='defaultdb'
AND table_schema='public'
AND table_name='t'
AND EXISTS (SELECT 1 FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE (c.relname = s.index_name OR c.relname = s.table_name) AND c.relnamespace = n.oid AND i.indexrelid = c.oid)
ORDER BY
1, 2, 4;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/delegate/show_database_indexes.go
line 59 at r3 (raw file):
Previously, msirek (Mark Sirek) wrote…
I see. Maybe an EXISTS clause could be used to prevent duplicates:
SELECT table_name, index_name, index_schema, non_unique::BOOL, seq_in_index, column_name, direction, storing::BOOL, implicit::BOOL, is_visible::BOOL AS visible FROM information_schema.statistics AS s JOIN pg_catalog.pg_namespace n ON n.nspname = s.index_schema WHERE table_catalog='defaultdb' AND table_schema='public' AND table_name='t' AND EXISTS (SELECT 1 FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE (c.relname = s.index_name OR c.relname = s.table_name) AND c.relnamespace = n.oid AND i.indexrelid = c.oid) ORDER BY 1, 2, 4;
ah, that works for the old query, but for the new one, we need to select from pg_index
, so pg_index
can't only be in an inner subquery
root@localhost:26257/defaultdb> SELECT
-> table_name,
-> index_name,
-> index_schema,
-> non_unique::BOOL,
-> seq_in_index,
-> column_name,
-> direction,
-> storing::BOOL,
-> implicit::BOOL,
-> is_visible::BOOL AS visible,
-> CASE
-> WHEN i.indkey[seq_in_index-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), seq_in_index)]
-> ELSE column_name
-> END AS definition
-> FROM
-> information_schema.statistics AS s
-> JOIN pg_catalog.pg_namespace n ON n.nspname = s.index_schema
-> WHERE
-> table_catalog='defaultdb'
-> AND table_schema='public'
-> AND table_name='t'
-> AND EXISTS (SELECT 1 FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE (c.relname = s.index_name OR c.relname =
-> s.table_name) AND c.relnamespace = n.oid AND i.indexrelid = c.oid)
-> ORDER BY
-> 1, 2, 4;
ERROR: no data source matches prefix: i in this context
SQLSTATE: 42P01
The 1st commit can be backported.
fixes #94690
Release note (bug fix): The pg_get_indexdef was fixed so that it shows
the expression used to define an expression-based index. In addition,
the function was previously including columns stored by the index,
which was incorrect and has now been fixed.
builtins: implement pg_get_indexdef as UDF
sql: add index expression to SHOW INDEXES
Release note (sql change): SHOW INDEXES will now show the expression
used to define an index, if one was used.