Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins: make pg_get_indexdef handle expression indexes #95413

Merged
merged 3 commits into from
Jan 20, 2023

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Jan 18, 2023

The 1st commit can be backported.


fixes #94690

Release note (bug fix): The pg_get_indexdef was fixed so that it shows
the expression used to define an expression-based index. In addition,
the function was previously including columns stored by the index,
which was incorrect and has now been fixed.


builtins: implement pg_get_indexdef as UDF


sql: add index expression to SHOW INDEXES

Release note (sql change): SHOW INDEXES will now show the expression
used to define an index, if one was used.

@rafiss rafiss requested a review from a team January 18, 2023 02:30
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss rafiss force-pushed the fix-pg_get_indexdef branch 2 times, most recently from 9536a74 to 3c1a57a Compare January 18, 2023 15:55
@Xiang-Gu
Copy link
Contributor

Since this PR is going to close the issue, can we add a test for SHOW INDEXES FROM t?

@rafiss
Copy link
Collaborator Author

rafiss commented Jan 18, 2023

hm, i didn't actually change SHOW INDEXES in this PR, but let me try doing that

@rafiss rafiss force-pushed the fix-pg_get_indexdef branch 2 times, most recently from 605b573 to b9ff5a6 Compare January 19, 2023 18:44
@rafiss rafiss requested a review from a team as a code owner January 19, 2023 18:44
@rafiss rafiss force-pushed the fix-pg_get_indexdef branch 2 times, most recently from 8607dc8 to 2b7f9d6 Compare January 19, 2023 20:03
@rafiss rafiss requested a review from a team as a code owner January 19, 2023 20:03
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice. Thanks.
I assume the string "N/A" is also what postgres uses. I was surprised about that. It's perhaps useful to highlight in a comment that this is intentional.

I have a few curiosity driven questions below but they need not block this PR.

non_unique::BOOL,
seq_in_index,
column_name,
CASE
WHEN seq_in_index <= 0 OR seq_in_index > array_length(i.indkey, 1) THEN 'N/A'
WHEN i.indkey[seq_in_index-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), seq_in_index)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perhaps add an explanatory comment with an example. I find this expression hard to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding

@@ -137,26 +137,29 @@ SELECT
non_unique::BOOL,
seq_in_index,
column_name,
CASE
WHEN seq_in_index <= 0 OR seq_in_index > array_length(i.indkey, 1) THEN 'N/A'
WHEN i.indkey[seq_in_index-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), seq_in_index)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding

@@ -2259,7 +2259,7 @@ b
query T
SELECT pg_catalog.pg_get_indexdef((SELECT oid from pg_class WHERE relname='pg_indexdef_cols_idx'), 3, false)
----
rowid
·
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what I'm seeing here. Could you explain in a review comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. is what the test uses to represent the empty string. showing rowid in the result here was actually a bug. I've added that to the release note, and left a comment on the test to explain what is being tested.

1 id
2 (json->>'bar'::STRING)
3 (length(id))
4 ·
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again i don't understand the 4th row

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. is the empty string. pg_get_indexdef is supposed to return the empty string for any column index that is not an index key. (confusing that the word index is overloaded in that sentence)

Body: `SELECT CASE
WHEN $2 = 0 THEN defs.indexdef
WHEN $2 < 0 OR $2 > array_length(i.indkey, 1) THEN ''
WHEN i.indkey[$2-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), $2)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like before this would benefit from an explanatory comment nearby.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition. I just have a couple non-blocking comments.

Reviewed 5 of 5 files at r1, 2 of 2 files at r2, 23 of 23 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @Xiang-Gu, and @ZhouXing19)


pkg/sql/delegate/show_database_indexes.go line 59 at r3 (raw file):

    JOIN %[1]s.pg_catalog.pg_class c_table ON c_table.relname = s.table_name
    JOIN %[1]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND c_table.relnamespace = n.oid AND n.nspname = s.index_schema
    JOIN %[1]s.pg_catalog.pg_index i ON i.indexrelid = c.oid AND i.indrelid = c_table.oid

It seems like there's an unnecessary join to pg_class here. It could be rewritten as:

FROM
    %[1]s.information_schema.statistics AS s
    JOIN %[1]s.pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
    JOIN %[1]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
    JOIN %[1]s.pg_catalog.pg_index i ON i.indexrelid = c.oid

With this rewrite and a test case I tried with 9 indexes, average runtime dropped from about 50 ms to 35 ms.


pkg/sql/delegate/show_table.go line 161 at r3 (raw file):

    JOIN %[4]s.pg_catalog.pg_class c_table ON c_table.relname = s.table_name
    JOIN %[4]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND c_table.relnamespace = n.oid AND n.nspname = s.index_schema
    JOIN %[4]s.pg_catalog.pg_index i ON i.indexrelid = c.oid AND i.indrelid = c_table.oid

Same as previous comment. The extra join to pg_class could be removed.

Release note (bug fix): The pg_get_indexdef was fixed so that it shows
the expression used to define an expression-based index. In addition,
the function was previously including columns stored by the index,
which was incorrect and has now been fixed.
Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @Xiang-Gu, and @ZhouXing19)


pkg/sql/delegate/show_database_indexes.go line 59 at r3 (raw file):

Previously, msirek (Mark Sirek) wrote…

It seems like there's an unnecessary join to pg_class here. It could be rewritten as:

FROM
    %[1]s.information_schema.statistics AS s
    JOIN %[1]s.pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
    JOIN %[1]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
    JOIN %[1]s.pg_catalog.pg_index i ON i.indexrelid = c.oid

With this rewrite and a test case I tried with 9 indexes, average runtime dropped from about 50 ms to 35 ms.

those don't seem equivalent. it can lead to double-counting if there are two tables that have indexes with the same names. example:

root@localhost:26257/defaultdb> create table t (a int, b int, index idx (b));
CREATE TABLE

root@localhost:26257/defaultdb> create table t2 (a int, b int, index idx (a));
CREATE TABLE

root@localhost:26257/defaultdb> select relname, oid from pg_class where relname = 'idx';
  relname |    oid
----------+-------------
  idx     |  737994182
  idx     | 3428148196

root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
                             ->     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
                             ->     JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             -> ORDER BY
                             ->     1, 2, 4;
  table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | t_pkey     | public       |     f      |            1 | rowid       | ASC       |    f    |    f     |    t
  t          | t_pkey     | public       |     f      |            3 | b           | N/A       |    t    |    f     |    t
  t          | t_pkey     | public       |     f      |            2 | a           | N/A       |    t    |    f     |    t

root@localhost:26257/defaultdb> drop table t2;
DROP TABLE

-- after dropping t2, there's no more double counting
root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
                             ->     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
                             ->     JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             -> ORDER BY
                             ->     1, 2, 4;
  table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | t_pkey     | public       |     f      |            1 | rowid       | ASC       |    f    |    f     |    t
  t          | t_pkey     | public       |     f      |            3 | b           | N/A       |    t    |    f     |    t
  t          | t_pkey     | public       |     f      |            2 | a           | N/A       |    t    |    f     |    t

i'm open to hearing other ways of avoiding a join though. i added it because i needed it to properly filter the join to pg_index (ON i.indexrelid = c.oid AND i.indrelid = c_table.oid).

@rafiss
Copy link
Collaborator Author

rafiss commented Jan 20, 2023

I used "N/A" since that's what SHOW INDEXES ... uses for the direction when showing columns that are stored in the index. That comes directly from information_schema.statistics:

indexDirectionNA, true, false); err != nil {

Neither SHOW INDEXES nor information_schema.statistics exists in PostgreSQL. (MySQL does have information_schema.statistics.) However, your comment reminded me that PostgreSQL does have:

postgres=# \dS+ i3
                       Index "public.i3"
 Column |  Type   | Key? | Definition | Storage | Stats target
--------+---------+------+------------+---------+--------------
 b      | integer | yes  | b          | plain   |
 expr   | integer | yes  | (a + c)    | plain   |

So now I think it would be nicer if we also use the term "definition" and just show the column name if there is no expression, like what \dS+ does.

@rafiss rafiss force-pushed the fix-pg_get_indexdef branch from 2b7f9d6 to 85dfba8 Compare January 20, 2023 06:29
@knz
Copy link
Contributor

knz commented Jan 20, 2023

Indeed my question about N/A was driven by my work on #88061 so we get an unsurprising output on \d

Release note (sql change): SHOW INDEXES will now show the expression
used to define an index, if one was used.
@rafiss rafiss force-pushed the fix-pg_get_indexdef branch from 85dfba8 to b80c3d7 Compare January 20, 2023 14:40
@rafiss
Copy link
Collaborator Author

rafiss commented Jan 20, 2023

tftr!

bors r=knz

@craig
Copy link
Contributor

craig bot commented Jan 20, 2023

Build succeeded:

@craig craig bot merged commit b861696 into cockroachdb:master Jan 20, 2023
@rafiss rafiss deleted the fix-pg_get_indexdef branch January 20, 2023 15:45
@lopezator
Copy link
Contributor

Awesome!!! Thank you for dealing with this @rafiss :)

Copy link
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/delegate/show_database_indexes.go line 59 at r3 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

those don't seem equivalent. it can lead to double-counting if there are two tables that have indexes with the same names. example:

root@localhost:26257/defaultdb> create table t (a int, b int, index idx (b));
CREATE TABLE

root@localhost:26257/defaultdb> create table t2 (a int, b int, index idx (a));
CREATE TABLE

root@localhost:26257/defaultdb> select relname, oid from pg_class where relname = 'idx';
  relname |    oid
----------+-------------
  idx     |  737994182
  idx     | 3428148196

root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
                             ->     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
                             ->     JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             -> ORDER BY
                             ->     1, 2, 4;
  table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | t_pkey     | public       |     f      |            1 | rowid       | ASC       |    f    |    f     |    t
  t          | t_pkey     | public       |     f      |            3 | b           | N/A       |    t    |    f     |    t
  t          | t_pkey     | public       |     f      |            2 | a           | N/A       |    t    |    f     |    t

root@localhost:26257/defaultdb> drop table t2;
DROP TABLE

-- after dropping t2, there's no more double counting
root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
                             ->     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
                             ->     JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             -> ORDER BY
                             ->     1, 2, 4;
  table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | t_pkey     | public       |     f      |            1 | rowid       | ASC       |    f    |    f     |    t
  t          | t_pkey     | public       |     f      |            3 | b           | N/A       |    t    |    f     |    t
  t          | t_pkey     | public       |     f      |            2 | a           | N/A       |    t    |    f     |    t

i'm open to hearing other ways of avoiding a join though. i added it because i needed it to properly filter the join to pg_index (ON i.indexrelid = c.oid AND i.indrelid = c_table.oid).

I see. Maybe an EXISTS clause could be used to prevent duplicates:

SELECT
table_name,
index_name,
index_schema,
non_unique::BOOL,
seq_in_index,
column_name,
direction,
storing::BOOL,
implicit::BOOL,
is_visible::BOOL AS visible
FROM
    information_schema.statistics AS s
    JOIN pg_catalog.pg_namespace n ON n.nspname = s.index_schema
WHERE
    table_catalog='defaultdb'
    AND table_schema='public'
    AND table_name='t'
    AND EXISTS (SELECT 1 FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE (c.relname = s.index_name OR c.relname = s.table_name) AND c.relnamespace = n.oid AND i.indexrelid = c.oid)
ORDER BY
    1, 2, 4;  

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/delegate/show_database_indexes.go line 59 at r3 (raw file):

Previously, msirek (Mark Sirek) wrote…

I see. Maybe an EXISTS clause could be used to prevent duplicates:

SELECT
table_name,
index_name,
index_schema,
non_unique::BOOL,
seq_in_index,
column_name,
direction,
storing::BOOL,
implicit::BOOL,
is_visible::BOOL AS visible
FROM
    information_schema.statistics AS s
    JOIN pg_catalog.pg_namespace n ON n.nspname = s.index_schema
WHERE
    table_catalog='defaultdb'
    AND table_schema='public'
    AND table_name='t'
    AND EXISTS (SELECT 1 FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE (c.relname = s.index_name OR c.relname = s.table_name) AND c.relnamespace = n.oid AND i.indexrelid = c.oid)
ORDER BY
    1, 2, 4;  

ah, that works for the old query, but for the new one, we need to select from pg_index, so pg_index can't only be in an inner subquery

root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible,
                             -> CASE
                             ->   WHEN i.indkey[seq_in_index-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), seq_in_index)]
                             ->   ELSE column_name
                             -> END AS definition
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_namespace n ON n.nspname = s.index_schema
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             ->     AND EXISTS (SELECT 1 FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE (c.relname = s.index_name OR c.relname =
                             -> s.table_name) AND c.relnamespace = n.oid AND i.indexrelid = c.oid)
                             -> ORDER BY
                             ->     1, 2, 4;
ERROR: no data source matches prefix: i in this context
SQLSTATE: 42P01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SHOW INDEX/ES stmts information about computed indexes is misleading
6 participants