Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add the ability to access a column from the results of an SRF #24832

Merged
merged 1 commit into from
Apr 24, 2018

Conversation

BramGruneir
Copy link
Member

@BramGruneir BramGruneir commented Apr 16, 2018

sql: add the ability to access a column from the results of an SRF

To be compatible with postgres, specifically to be able to handle #16971
correctly. There is a need to be able to access the results of a Set Returning
Function (SRF) using a . accessor.

This commit does 3 things.
Firstly, it adds the appropriate grammer.

Secondly, it rewrites the rewriteSRF function to handle named column accessors
correctly.

Before this change, the only way to decompose an SRF into columns was to set it
it as the data source in the FROM clause, like so:

SELECT * FROM information_schema._pg_expandarray(ARRAY['c', 'b', 'a']);
+---+---+
| x | n |
+---+---+
| c | 1 |
| b | 2 |
| a | 3 |
+---+---+

SELECT x FROM information_schema._pg_expandarray(ARRAY['c', 'b', 'a']);
+---+
| x |
+---+
| c |
| b |
| a |
+---+

SELECT n FROM information_schema._pg_expandarray(ARRAY['c', 'b', 'a']);
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
+---+

With this commit, the following are now valid syntax:

SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).*;
+---+---+
| x | n |
+---+---+
| c | 1 |
| b | 2 |
| a | 3 |
+---+---+

SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).x;
+---+
| x |
+---+
| c |
| b |
| a |
+---+

SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).n;
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
+---+

Thirdly, this changes the behaviour of the resulting columns from SRFs. If
there is more than one named element in the resulting tuple, the column for the
tuple will now be named for all elements:

SELECT information_schema._pg_expandarray(ARRAY[3, 2, 1])

Will produce a tuple column like named
("information_schema._pg_expandarray".x, "information_schema._pg_expandarray".n).

The bahaviour does deviate from postgres, which simply names the column after
the name of the function, in this case _pg_expandarray.

Here's another example of this difference:

SELECT 'a', pg_get_keywords(), 'c' LIMIT 1;

In Postgres:
 ?column? |   pg_get_keywords    | ?column?
----------+----------------------+----------
 a        | (abort,U,unreserved) | c

In Cockroach:
+-----+--------------------------------+-----+
| 'a' |     (pg_get_keywords.word,     | 'c' |
|     |    pg_get_keywords.catcode,    |     |
|     |    pg_get_keywords.catdesc)    |     |
+-----+--------------------------------+-----+
| a   | ('abort','U','unreserved')     | c   |
+-----+--------------------------------+-----+

Release note (sql change): Set Returning Functions (SRF) can now be accessed
using (SRF).x where x is the name of a column returned form the SRF or a
*. For example,
SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).x and
SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).* are both
now valid syntax.
Also, the naming of the resulting columns from SRFs has been updated to provide
more information about the resulting tuple.

Co-authored-by: Raphael 'kena' Poss [email protected]

@BramGruneir BramGruneir requested review from a team April 16, 2018 17:50
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@BramGruneir BramGruneir added the do-not-merge bors won't merge a PR with this label. label Apr 16, 2018
@BramGruneir
Copy link
Member Author

@knz Here is my WIP I was talking about. Take a look at the final test case the new logic tests. The one that is commented out. This is what I still need to address.

@knz
Copy link
Contributor

knz commented Apr 17, 2018

This is good work and a clear step forward.

Also this PR has helped us (at least me) understand/uncover the additional semantics described in #24866, which suggests there is more work needed before fully solving #16971. I recommend not going there for the time being, and focusing on one step at a time (this one).


Reviewed 10 of 10 files at r1.
Review status: all files reviewed at latest revision, all discussions resolved.


pkg/sql/render.go, line 554 at r1 (raw file):

func (v *srfExtractionVisitor) VisitPost(expr tree.Expr) tree.Expr {
	switch t := expr.(type) {
	case *tree.ColumnAccessExpr:

I fear that this logic is incomplete. I have experimented and found an extension of this code that works better. Let's chat about it.


pkg/sql/logictest/testdata/logic_test/generators, line 492 at r1 (raw file):

# This test case should work, but right now it doesn't.
# query I colnames
# SELECT (i.keys).n FROM (SELECT information_schema._pg_expandarray(ARRAY[3,2,1]) AS keys) AS i;

This test is exercising a totally unrelated feature, unfortunately. Please add a TODO and refer to this: #24866


Comments from Reviewable

@knz knz force-pushed the rewriteSRF branch 3 times, most recently from b33daac to 2412bcb Compare April 17, 2018 17:55
@knz
Copy link
Contributor

knz commented Apr 17, 2018

Reviewed 6 of 7 files at r2, 2 of 2 files at r3.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


pkg/sql/logictest/testdata/logic_test/generators, line 109 at r3 (raw file):

generator  ·  ·

query error unimplemented: cannot use multiple set-generating functions in a single SELECT clause

maybe add a TODO here with a reference to #20511.


pkg/sql/logictest/testdata/logic_test/generators, line 235 at r3 (raw file):

SELECT 'a', pg_get_keywords(), 'c' LIMIT 1
----
'a'  (pg_get_keywords.word, pg_get_keywords.catcode, pg_get_keywords.catdesc)              'c'

we may or may not want to keep the change in the column label. I haven't yet figured out how to restore the previous behavior.


Comments from Reviewable

@BramGruneir
Copy link
Member Author

pkg/sql/logictest/testdata/logic_test/generators, line 235 at r3 (raw file):

Previously, knz (kena) wrote…

we may or may not want to keep the change in the column label. I haven't yet figured out how to restore the previous behavior.

I like this new behaviour.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Apr 17, 2018

The commit message (and the release note) need to be extended to showcase some examples.


Review status: all files reviewed at latest revision, 4 unresolved discussions.


Comments from Reviewable

@BramGruneir
Copy link
Member Author

Ok, this is ready for a proper review. There one leftover failing logic test that I'm looking into.

@jordanlewis you've touched this code in the past, can you take a quick look? I think it would be a good idea to get another set of eyes on it.

@rytaft there is a sql language change here, so I wanted someone on the optimizer team to take note.

@BramGruneir BramGruneir removed the do-not-merge bors won't merge a PR with this label. label Apr 18, 2018
@BramGruneir BramGruneir changed the title [wip] sql: add the ability to access a column from the results of an SRF sql: add the ability to access a column from the results of an SRF Apr 18, 2018
@BramGruneir
Copy link
Member Author

I also updated the PR message with the updated commit message.

@BramGruneir BramGruneir added this to the 2.1 milestone Apr 18, 2018
@BramGruneir
Copy link
Member Author

Found and fixed that failing logic test. This PR is ready for a final review.


Review status: 6 of 12 files reviewed at latest revision, 4 unresolved discussions.


pkg/sql/logictest/testdata/logic_test/generators, line 492 at r1 (raw file):

Previously, knz (kena) wrote…

This test is exercising a totally unrelated feature, unfortunately. Please add a TODO and refer to this: #24866

Done.


pkg/sql/logictest/testdata/logic_test/generators, line 109 at r3 (raw file):

Previously, knz (kena) wrote…

maybe add a TODO here with a reference to #20511.

Done.


Comments from Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Apr 18, 2018

@rytaft there is a sql language change here, so I wanted someone on the optimizer team to take note.

👍 thanks! I don't feel like I have enough background on this to give you a good quality review, but I've made a note of this on our spreadsheet of TODOs for the optimizer.


Review status: 6 of 12 files reviewed at latest revision, 4 unresolved discussions.


Comments from Reviewable

@BramGruneir
Copy link
Member Author

@jordanlewis Soft ping to please take a look when you have a few mins.

@jordanlewis
Copy link
Member

:lgtm: this is awesome! happy to see this go in


Reviewed 2 of 10 files at r1, 4 of 7 files at r2, 4 of 5 files at r4, 2 of 2 files at r5.
Review status: all files reviewed at latest revision, 4 unresolved discussions.


pkg/sql/sem/tree/expr.go, line 1446 at r5 (raw file):

}

// ColumnAccessExpr *** more here

^^^ needs attention


Comments from Reviewable

To be compatible with postgres, specifically to be able to handle cockroachdb#16971
correctly. There is a need to be able to access the results of a Set Returning
Function (SRF) using a `.` accessor.

This commit does 3 things.
Firstly, it adds the appropriate grammer.

Secondly, it rewrites the rewriteSRF function to handle named column accessors
correctly.

Before this change, the only way to decompose an SRF into columns was to set it
it as the data source in the FROM clause, like so:
```sql
SELECT * FROM information_schema._pg_expandarray(ARRAY['c', 'b', 'a']);
+---+---+
| x | n |
+---+---+
| c | 1 |
| b | 2 |
| a | 3 |
+---+---+

SELECT x FROM information_schema._pg_expandarray(ARRAY['c', 'b', 'a']);
+---+
| x |
+---+
| c |
| b |
| a |
+---+

SELECT n FROM information_schema._pg_expandarray(ARRAY['c', 'b', 'a']);
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
+---+
```

With this commit, the following are now valid syntax:

```sql
SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).*;
+---+---+
| x | n |
+---+---+
| c | 1 |
| b | 2 |
| a | 3 |
+---+---+

SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).x;
+---+
| x |
+---+
| c |
| b |
| a |
+---+

SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).n;
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
+---+
```

Thirdly, this changes the behaviour of the resulting columns from SRFs. If
there is more than one named element in the resulting tuple, the column for the
tuple will now be named for all elements:
```sql
SELECT information_schema._pg_expandarray(ARRAY[3, 2, 1])
```

Will produce a tuple column like named
`("information_schema._pg_expandarray".x, "information_schema._pg_expandarray".n)`.

The bahaviour does deviate from postgres, which simply names the column after
the name of the function, in this case `_pg_expandarray`.

Here's another example of this difference:

```sql
SELECT 'a', pg_get_keywords(), 'c' LIMIT 1;

In Postgres:
 ?column? |   pg_get_keywords    | ?column?
----------+----------------------+----------
 a        | (abort,U,unreserved) | c

In Cockroach:
+-----+--------------------------------+-----+
| 'a' |     (pg_get_keywords.word,     | 'c' |
|     |    pg_get_keywords.catcode,    |     |
|     |    pg_get_keywords.catdesc)    |     |
+-----+--------------------------------+-----+
| a   | ('abort','U','unreserved')     | c   |
+-----+--------------------------------+-----+
```

Release note (sql change): Set Returning Functions (SRF) can now be accessed
using `(SRF).x` where `x` is the name of a column returned form the SRF or a
 `*`. For example,
`SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).x` and
`SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).*` are both
now valid syntax.
Also, the naming of the resulting columns from SRFs has been updated to provide
more information about the resulting tuple.

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@BramGruneir
Copy link
Member Author

Review status: 4 of 12 files reviewed at latest revision, 5 unresolved discussions.


pkg/sql/sem/tree/expr.go, line 1446 at r5 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

^^^ needs attention

Done.


Comments from Reviewable

@BramGruneir
Copy link
Member Author

bors +r

@craig
Copy link
Contributor

craig bot commented Apr 24, 2018

Did you mean "r+"?

@BramGruneir
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Apr 24, 2018
24832: sql: add the ability to access a column from the results of an SRF r=BramGruneir a=BramGruneir

 sql: add the ability to access a column from the results of an SRF

To be compatible with postgres, specifically to be able to handle #16971
correctly. There is a need to be able to access the results of a Set Returning
Function (SRF) using a `.` accessor.

This commit does 3 things.
Firstly, it adds the appropriate grammer.

Secondly, it rewrites the rewriteSRF function to handle named column accessors
correctly.

Before this change, the only way to decompose an SRF into columns was to set it
it as the data source in the FROM clause, like so:
```sql
SELECT * FROM information_schema._pg_expandarray(ARRAY['c', 'b', 'a']);
+---+---+
| x | n |
+---+---+
| c | 1 |
| b | 2 |
| a | 3 |
+---+---+

SELECT x FROM information_schema._pg_expandarray(ARRAY['c', 'b', 'a']);
+---+
| x |
+---+
| c |
| b |
| a |
+---+

SELECT n FROM information_schema._pg_expandarray(ARRAY['c', 'b', 'a']);
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
+---+
```

With this commit, the following are now valid syntax:

```sql
SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).*;
+---+---+
| x | n |
+---+---+
| c | 1 |
| b | 2 |
| a | 3 |
+---+---+

SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).x;
+---+
| x |
+---+
| c |
| b |
| a |
+---+

SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).n;
+---+
| n |
+---+
| 1 |
| 2 |
| 3 |
+---+
```

Thirdly, this changes the behaviour of the resulting columns from SRFs. If
there is more than one named element in the resulting tuple, the column for the
tuple will now be named for all elements:
```sql
SELECT information_schema._pg_expandarray(ARRAY[3, 2, 1])
```

Will produce a tuple column like named
`("information_schema._pg_expandarray".x, "information_schema._pg_expandarray".n)`.

The bahaviour does deviate from postgres, which simply names the column after
the name of the function, in this case `_pg_expandarray`.

Here's another example of this difference:

```sql
SELECT 'a', pg_get_keywords(), 'c' LIMIT 1;

In Postgres:
 ?column? |   pg_get_keywords    | ?column?
----------+----------------------+----------
 a        | (abort,U,unreserved) | c

In Cockroach:
+-----+--------------------------------+-----+
| 'a' |     (pg_get_keywords.word,     | 'c' |
|     |    pg_get_keywords.catcode,    |     |
|     |    pg_get_keywords.catdesc)    |     |
+-----+--------------------------------+-----+
| a   | ('abort','U','unreserved')     | c   |
+-----+--------------------------------+-----+
```

Release note (sql change): Set Returning Functions (SRF) can now be accessed
using `(SRF).x` where `x` is the name of a column returned form the SRF or a
 `*`. For example,
`SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).x` and
`SELECT (information_schema._pg_expandarray(ARRAY['c', 'b', 'a'])).*` are both
now valid syntax.
Also, the naming of the resulting columns from SRFs has been updated to provide
more information about the resulting tuple.

Co-authored-by: Raphael 'kena' Poss <[email protected]>

Co-authored-by: Bram Gruneir <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 24, 2018

Build succeeded

@craig craig bot merged commit fef4d74 into cockroachdb:master Apr 24, 2018
@BramGruneir BramGruneir deleted the rewriteSRF branch April 25, 2018 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants