builtins: make pg_get_indexdef handle expression indexes #95413

rafiss · 2023-01-18T02:30:13Z

The 1st commit can be backported.

Release note (bug fix): The pg_get_indexdef was fixed so that it shows
the expression used to define an expression-based index. In addition,
the function was previously including columns stored by the index,
which was incorrect and has now been fixed.

builtins: implement pg_get_indexdef as UDF

sql: add index expression to SHOW INDEXES

Release note (sql change): SHOW INDEXES will now show the expression
used to define an index, if one was used.

cockroach-teamcity · 2023-01-18T02:30:24Z

This change is

Xiang-Gu · 2023-01-18T18:37:04Z

Since this PR is going to close the issue, can we add a test for SHOW INDEXES FROM t?

rafiss · 2023-01-18T22:12:37Z

hm, i didn't actually change SHOW INDEXES in this PR, but let me try doing that

knz

This is very nice. Thanks.
I assume the string "N/A" is also what postgres uses. I was surprised about that. It's perhaps useful to highlight in a comment that this is intentional.

I have a few curiosity driven questions below but they need not block this PR.

knz · 2023-01-19T23:56:06Z

pkg/sql/delegate/show_database_indexes.go

 	non_unique::BOOL,
 	seq_in_index,
 	column_name,
+    CASE
+      WHEN seq_in_index <= 0 OR seq_in_index > array_length(i.indkey, 1) THEN 'N/A'
+      WHEN i.indkey[seq_in_index-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), seq_in_index)]


Could you perhaps add an explanatory comment with an example. I find this expression hard to read.

knz · 2023-01-19T23:56:50Z

pkg/sql/delegate/show_table.go

@@ -137,26 +137,29 @@ SELECT
    non_unique::BOOL,
    seq_in_index,
    column_name,
+    CASE
+      WHEN seq_in_index <= 0 OR seq_in_index > array_length(i.indkey, 1) THEN 'N/A'
+      WHEN i.indkey[seq_in_index-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), seq_in_index)]


knz · 2023-01-19T23:58:35Z

pkg/sql/logictest/testdata/logic_test/builtin_function

@@ -2259,7 +2259,7 @@ b
 query T
 SELECT pg_catalog.pg_get_indexdef((SELECT oid from pg_class WHERE relname='pg_indexdef_cols_idx'), 3, false)
 ----
-rowid
+·


I don't understand what I'm seeing here. Could you explain in a review comment?

. is what the test uses to represent the empty string. showing rowid in the result here was actually a bug. I've added that to the release note, and left a comment on the test to explain what is being tested.

knz · 2023-01-19T23:59:34Z

pkg/sql/logictest/testdata/logic_test/pg_builtins

+1  id
+2  (json->>'bar'::STRING)
+3  (length(id))
+4  ·


Again i don't understand the 4th row

. is the empty string. pg_get_indexdef is supposed to return the empty string for any column index that is not an index key. (confusing that the word index is overloaded in that sentence)

knz · 2023-01-20T00:02:17Z

pkg/sql/sem/builtins/pg_builtins.go

+			Body: `SELECT CASE
+    				WHEN $2 = 0 THEN defs.indexdef
+						WHEN $2 < 0 OR $2 > array_length(i.indkey, 1) THEN ''
+						WHEN i.indkey[$2-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), $2)]


Like before this would benefit from an explanatory comment nearby.

msirek

Nice addition. I just have a couple non-blocking comments.

Reviewed 5 of 5 files at r1, 2 of 2 files at r2, 23 of 23 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss, @Xiang-Gu, and @ZhouXing19)

pkg/sql/delegate/show_database_indexes.go line 59 at r3 (raw file):

    JOIN %[1]s.pg_catalog.pg_class c_table ON c_table.relname = s.table_name
    JOIN %[1]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND c_table.relnamespace = n.oid AND n.nspname = s.index_schema
    JOIN %[1]s.pg_catalog.pg_index i ON i.indexrelid = c.oid AND i.indrelid = c_table.oid

It seems like there's an unnecessary join to pg_class here. It could be rewritten as:

FROM
    %[1]s.information_schema.statistics AS s
    JOIN %[1]s.pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
    JOIN %[1]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
    JOIN %[1]s.pg_catalog.pg_index i ON i.indexrelid = c.oid

With this rewrite and a test case I tried with 9 indexes, average runtime dropped from about 50 ms to 35 ms.

pkg/sql/delegate/show_table.go line 161 at r3 (raw file):

    JOIN %[4]s.pg_catalog.pg_class c_table ON c_table.relname = s.table_name
    JOIN %[4]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND c_table.relnamespace = n.oid AND n.nspname = s.index_schema
    JOIN %[4]s.pg_catalog.pg_index i ON i.indexrelid = c.oid AND i.indrelid = c_table.oid

Same as previous comment. The extra join to pg_class could be removed.

Release note (bug fix): The pg_get_indexdef was fixed so that it shows the expression used to define an expression-based index. In addition, the function was previously including columns stored by the index, which was incorrect and has now been fixed.

Release note: None

rafiss

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @msirek, @Xiang-Gu, and @ZhouXing19)

pkg/sql/delegate/show_database_indexes.go line 59 at r3 (raw file):

Previously, msirek (Mark Sirek) wrote…

It seems like there's an unnecessary join to pg_class here. It could be rewritten as:
FROM
    %[1]s.information_schema.statistics AS s
    JOIN %[1]s.pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
    JOIN %[1]s.pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
    JOIN %[1]s.pg_catalog.pg_index i ON i.indexrelid = c.oid
With this rewrite and a test case I tried with 9 indexes, average runtime dropped from about 50 ms to 35 ms.

those don't seem equivalent. it can lead to double-counting if there are two tables that have indexes with the same names. example:

root@localhost:26257/defaultdb> create table t (a int, b int, index idx (b));
CREATE TABLE

root@localhost:26257/defaultdb> create table t2 (a int, b int, index idx (a));
CREATE TABLE

root@localhost:26257/defaultdb> select relname, oid from pg_class where relname = 'idx';
  relname |    oid
----------+-------------
  idx     |  737994182
  idx     | 3428148196

root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
                             ->     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
                             ->     JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             -> ORDER BY
                             ->     1, 2, 4;
  table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | t_pkey     | public       |     f      |            1 | rowid       | ASC       |    f    |    f     |    t
  t          | t_pkey     | public       |     f      |            3 | b           | N/A       |    t    |    f     |    t
  t          | t_pkey     | public       |     f      |            2 | a           | N/A       |    t    |    f     |    t

root@localhost:26257/defaultdb> drop table t2;
DROP TABLE

-- after dropping t2, there's no more double counting
root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
                             ->     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
                             ->     JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             -> ORDER BY
                             ->     1, 2, 4;
  table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | t_pkey     | public       |     f      |            1 | rowid       | ASC       |    f    |    f     |    t
  t          | t_pkey     | public       |     f      |            3 | b           | N/A       |    t    |    f     |    t
  t          | t_pkey     | public       |     f      |            2 | a           | N/A       |    t    |    f     |    t

i'm open to hearing other ways of avoiding a join though. i added it because i needed it to properly filter the join to pg_index (ON i.indexrelid = c.oid AND i.indrelid = c_table.oid).

rafiss · 2023-01-20T06:22:28Z

I used "N/A" since that's what SHOW INDEXES ... uses for the direction when showing columns that are stored in the index. That comes directly from information_schema.statistics:

cockroach/pkg/sql/information_schema.go

Line 1234 in 0775fcc

indexDirectionNA, true, false); err != nil {

Neither SHOW INDEXES nor information_schema.statistics exists in PostgreSQL. (MySQL does have information_schema.statistics.) However, your comment reminded me that PostgreSQL does have:

postgres=# \dS+ i3
                       Index "public.i3"
 Column |  Type   | Key? | Definition | Storage | Stats target
--------+---------+------+------------+---------+--------------
 b      | integer | yes  | b          | plain   |
 expr   | integer | yes  | (a + c)    | plain   |

So now I think it would be nicer if we also use the term "definition" and just show the column name if there is no expression, like what \dS+ does.

knz · 2023-01-20T07:36:55Z

Indeed my question about N/A was driven by my work on #88061 so we get an unsurprising output on \d

Release note (sql change): SHOW INDEXES will now show the expression used to define an index, if one was used.

rafiss · 2023-01-20T14:40:38Z

tftr!

bors r=knz

craig · 2023-01-20T15:37:13Z

Build succeeded:

Bazel Essential CI (Cockroach)

lopezator · 2023-01-20T16:56:59Z

Awesome!!! Thank you for dealing with this @rafiss :)

msirek

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/sql/delegate/show_database_indexes.go line 59 at r3 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

those don't seem equivalent. it can lead to double-counting if there are two tables that have indexes with the same names. example:

root@localhost:26257/defaultdb> create table t (a int, b int, index idx (b));
CREATE TABLE

root@localhost:26257/defaultdb> create table t2 (a int, b int, index idx (a));
CREATE TABLE

root@localhost:26257/defaultdb> select relname, oid from pg_class where relname = 'idx';
  relname |    oid
----------+-------------
  idx     |  737994182
  idx     | 3428148196

root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
                             ->     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
                             ->     JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             -> ORDER BY
                             ->     1, 2, 4;
  table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | t_pkey     | public       |     f      |            1 | rowid       | ASC       |    f    |    f     |    t
  t          | t_pkey     | public       |     f      |            3 | b           | N/A       |    t    |    f     |    t
  t          | t_pkey     | public       |     f      |            2 | a           | N/A       |    t    |    f     |    t

root@localhost:26257/defaultdb> drop table t2;
DROP TABLE

-- after dropping t2, there's no more double counting
root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_class c ON c.relname = s.index_name OR c.relname = s.table_name
                             ->     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid AND n.nspname = s.index_schema
                             ->     JOIN pg_catalog.pg_index i ON i.indexrelid = c.oid
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             -> ORDER BY
                             ->     1, 2, 4;
  table_name | index_name | index_schema | non_unique | seq_in_index | column_name | direction | storing | implicit | visible
-------------+------------+--------------+------------+--------------+-------------+-----------+---------+----------+----------
  t          | idx        | public       |     t      |            1 | b           | ASC       |    f    |    f     |    t
  t          | idx        | public       |     t      |            2 | rowid       | ASC       |    f    |    t     |    t
  t          | t_pkey     | public       |     f      |            1 | rowid       | ASC       |    f    |    f     |    t
  t          | t_pkey     | public       |     f      |            3 | b           | N/A       |    t    |    f     |    t
  t          | t_pkey     | public       |     f      |            2 | a           | N/A       |    t    |    f     |    t

i'm open to hearing other ways of avoiding a join though. i added it because i needed it to properly filter the join to pg_index (ON i.indexrelid = c.oid AND i.indrelid = c_table.oid).

I see. Maybe an EXISTS clause could be used to prevent duplicates:

SELECT
table_name,
index_name,
index_schema,
non_unique::BOOL,
seq_in_index,
column_name,
direction,
storing::BOOL,
implicit::BOOL,
is_visible::BOOL AS visible
FROM
    information_schema.statistics AS s
    JOIN pg_catalog.pg_namespace n ON n.nspname = s.index_schema
WHERE
    table_catalog='defaultdb'
    AND table_schema='public'
    AND table_name='t'
    AND EXISTS (SELECT 1 FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE (c.relname = s.index_name OR c.relname = s.table_name) AND c.relnamespace = n.oid AND i.indexrelid = c.oid)
ORDER BY
    1, 2, 4;

rafiss

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/sql/delegate/show_database_indexes.go line 59 at r3 (raw file):

Previously, msirek (Mark Sirek) wrote…

I see. Maybe an EXISTS clause could be used to prevent duplicates:

SELECT
table_name,
index_name,
index_schema,
non_unique::BOOL,
seq_in_index,
column_name,
direction,
storing::BOOL,
implicit::BOOL,
is_visible::BOOL AS visible
FROM
    information_schema.statistics AS s
    JOIN pg_catalog.pg_namespace n ON n.nspname = s.index_schema
WHERE
    table_catalog='defaultdb'
    AND table_schema='public'
    AND table_name='t'
    AND EXISTS (SELECT 1 FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE (c.relname = s.index_name OR c.relname = s.table_name) AND c.relnamespace = n.oid AND i.indexrelid = c.oid)
ORDER BY
    1, 2, 4;

ah, that works for the old query, but for the new one, we need to select from pg_index, so pg_index can't only be in an inner subquery

root@localhost:26257/defaultdb> SELECT
                             -> table_name,
                             -> index_name,
                             -> index_schema,
                             -> non_unique::BOOL,
                             -> seq_in_index,
                             -> column_name,
                             -> direction,
                             -> storing::BOOL,
                             -> implicit::BOOL,
                             -> is_visible::BOOL AS visible,
                             -> CASE
                             ->   WHEN i.indkey[seq_in_index-1] = 0 THEN (indexprs::STRING[])[array_position(array_positions(i.indkey, 0), seq_in_index)]
                             ->   ELSE column_name
                             -> END AS definition
                             -> FROM
                             ->     information_schema.statistics AS s
                             ->     JOIN pg_catalog.pg_namespace n ON n.nspname = s.index_schema
                             -> WHERE
                             ->     table_catalog='defaultdb'
                             ->     AND table_schema='public'
                             ->     AND table_name='t'
                             ->     AND EXISTS (SELECT 1 FROM pg_catalog.pg_class c, pg_catalog.pg_index i WHERE (c.relname = s.index_name OR c.relname =
                             -> s.table_name) AND c.relnamespace = n.oid AND i.indexrelid = c.oid)
                             -> ORDER BY
                             ->     1, 2, 4;
ERROR: no data source matches prefix: i in this context
SQLSTATE: 42P01

rafiss requested a review from a team January 18, 2023 02:30

rafiss force-pushed the fix-pg_get_indexdef branch 2 times, most recently from 9536a74 to 3c1a57a Compare January 18, 2023 15:55

rafiss force-pushed the fix-pg_get_indexdef branch 2 times, most recently from 605b573 to b9ff5a6 Compare January 19, 2023 18:44

rafiss requested a review from a team as a code owner January 19, 2023 18:44

rafiss force-pushed the fix-pg_get_indexdef branch 2 times, most recently from 8607dc8 to 2b7f9d6 Compare January 19, 2023 20:03

rafiss requested a review from a team as a code owner January 19, 2023 20:03

rafiss requested review from msirek, Xiang-Gu and ZhouXing19 January 19, 2023 20:03

knz approved these changes Jan 20, 2023

View reviewed changes

msirek reviewed Jan 20, 2023

View reviewed changes

rafiss added 2 commits January 20, 2023 01:05

builtins: implement pg_get_indexdef as UDF

22b8d19

Release note: None

rafiss commented Jan 20, 2023

View reviewed changes

rafiss force-pushed the fix-pg_get_indexdef branch from 2b7f9d6 to 85dfba8 Compare January 20, 2023 06:29

sql: add index expression to SHOW INDEXES

b80c3d7

Release note (sql change): SHOW INDEXES will now show the expression used to define an index, if one was used.

rafiss force-pushed the fix-pg_get_indexdef branch from 85dfba8 to b80c3d7 Compare January 20, 2023 14:40

craig bot merged commit b861696 into cockroachdb:master Jan 20, 2023

rafiss deleted the fix-pg_get_indexdef branch January 20, 2023 15:45

rafiss mentioned this pull request Jan 20, 2023

release-22.2: builtins: make pg_get_indexdef handle expression indexes #95584

Merged

rafiss mentioned this pull request Jan 20, 2023

release-22.1: builtins: make pg_get_indexdef handle expression indexes #95585

Merged

msirek reviewed Jan 20, 2023

View reviewed changes

rafiss commented Jan 20, 2023

View reviewed changes

cockroach-teamcity mentioned this pull request Jan 21, 2023

PR #95413 - sql: add index expression to SHOW INDEXES cockroachdb/docs#16062

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

builtins: make pg_get_indexdef handle expression indexes #95413

builtins: make pg_get_indexdef handle expression indexes #95413

rafiss commented Jan 18, 2023 •

edited

Loading

cockroach-teamcity commented Jan 18, 2023

Xiang-Gu commented Jan 18, 2023

rafiss commented Jan 18, 2023

knz left a comment •

edited

Loading

knz Jan 19, 2023

rafiss Jan 20, 2023

knz Jan 19, 2023

rafiss Jan 20, 2023

knz Jan 19, 2023

rafiss Jan 20, 2023

knz Jan 19, 2023

rafiss Jan 20, 2023

knz Jan 20, 2023

rafiss Jan 20, 2023

msirek left a comment

rafiss left a comment

rafiss commented Jan 20, 2023

knz commented Jan 20, 2023

rafiss commented Jan 20, 2023

craig bot commented Jan 20, 2023

lopezator commented Jan 20, 2023

msirek left a comment

rafiss left a comment

builtins: make pg_get_indexdef handle expression indexes #95413

builtins: make pg_get_indexdef handle expression indexes #95413

Conversation

rafiss commented Jan 18, 2023 • edited Loading

cockroach-teamcity commented Jan 18, 2023

Xiang-Gu commented Jan 18, 2023

rafiss commented Jan 18, 2023

knz left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msirek left a comment

Choose a reason for hiding this comment

rafiss left a comment

Choose a reason for hiding this comment

rafiss commented Jan 20, 2023

knz commented Jan 20, 2023

rafiss commented Jan 20, 2023

craig bot commented Jan 20, 2023

lopezator commented Jan 20, 2023

msirek left a comment

Choose a reason for hiding this comment

rafiss left a comment

Choose a reason for hiding this comment

rafiss commented Jan 18, 2023 •

edited

Loading

knz left a comment •

edited

Loading