-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support PostgreSQL array subquery constructor #566
Conversation
c6edf5f
to
7ed261e
Compare
Pull Request Test Coverage Report for Build 2848516561
💛 - Coveralls |
@@ -2518,7 +2518,7 @@ fn parse_bad_constraint() { | |||
|
|||
#[test] | |||
fn parse_scalar_function_in_projection() { | |||
let names = vec!["sqrt", "array", "foo"]; | |||
let names = vec!["sqrt", "foo"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a controversial change for this test. PostgreSQL doesn't have a function-based array constructor (array(1, 2, 3)
), nor can I find information on other databases using such a syntax. Some, like BigQuery, use the ARRAY(subquery)
syntax implemented there. It'd be great to hear some thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was added in #432
Can you review the description on that PR and let me know what you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, postgres doesn't allow the syntax used in the DataFusion test (and referred to in #432)
alamb=# SELECT array(c1, cast(c2 as varchar)) FROM test;
ERROR: syntax error at or near "c1"
LINE 1: SELECT array(c1, cast(c2 as varchar)) FROM test;
^
https://github.com/apache/arrow-datafusion/blame/834924f274d9e04444030b8eb3e07e1035fcd3cf/datafusion/core/tests/sql/functions.rs#L134 doesn't really help me figure out from who the test originated from as it was moved (maybe some more poking around could find the original author)
@ovr do you remember any of the previous history here?
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I poked around and found this commit: apache/datafusion@98b710a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I poked around and found this commit: apache/datafusion@98b710a
Nice -- that came in from apache/arrow#8102 / https://issues.apache.org/jira/browse/ARROW-9902 by @jorgecarleitao and approved by @andygrove
Can either if you remember if/how using array
as a function (rather than the more specialized SQL support offered by postgres) was important? We are triyng to sort out how important backwards compatibility is in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this as long as DataFusion team's okay with that. I haven't been able to find a dialect with built-in array
function, but at least two of them with built-in array subquery constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb do you feel it's a good idea to remove the GenericDialiect | PostgreSqlDialect
check now so later someone can make negated checks for dialects that shouldn't go the subquery route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been able to find a dialect with built-in array function, but at least two of them with built-in array subquery constructor.
Here is the spark reference docs: https://spark.apache.org/docs/2.4.1/api/sql/index.html#array (though perhaps you mean there is no SparkDialect
in this crate)?
do you feel it's a good idea to remove the GenericDialiect | PostgreSqlDialect check now so later someone can make negated checks for dialects that shouldn't go the subquery route?
Sounds reasonable to me -- can you also possibly link one or two examples from other dialects that support the array subquery constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PostgreSQL array subquery constructor (in fact, all of array constructors): https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS
I haven't been able to find info on other dialects present in the crate to support array subquery, but at least CockroachDB and BigQuery support array subquery.
However while searching around I found that ClickHouse (which we do have a dialect for) actually has array function constructor built in: https://clickhouse.com/docs/en/sql-reference/functions/array-functions/#arrayx1--operator-x1-
So we can either make a decision for leaving this on for specific dialects (GenericDialect | PostgreSqlDialect
for instance) or negate this to support ARRAY(x1, x2, …)
for dialects known to support array function constructor (currently ClickHouseDialect
only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's negate it for ClickHouseDialect
and call it a day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MazterQyou -- the code looks great but as you have identified I am not sure about the array
test change. I left some comments
@@ -2518,7 +2518,7 @@ fn parse_bad_constraint() { | |||
|
|||
#[test] | |||
fn parse_scalar_function_in_projection() { | |||
let names = vec!["sqrt", "array", "foo"]; | |||
let names = vec!["sqrt", "foo"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was added in #432
Can you review the description on that PR and let me know what you think?
@@ -2518,7 +2518,7 @@ fn parse_bad_constraint() { | |||
|
|||
#[test] | |||
fn parse_scalar_function_in_projection() { | |||
let names = vec!["sqrt", "array", "foo"]; | |||
let names = vec!["sqrt", "foo"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, postgres doesn't allow the syntax used in the DataFusion test (and referred to in #432)
alamb=# SELECT array(c1, cast(c2 as varchar)) FROM test;
ERROR: syntax error at or near "c1"
LINE 1: SELECT array(c1, cast(c2 as varchar)) FROM test;
^
https://github.com/apache/arrow-datafusion/blame/834924f274d9e04444030b8eb3e07e1035fcd3cf/datafusion/core/tests/sql/functions.rs#L134 doesn't really help me figure out from who the test originated from as it was moved (maybe some more poking around could find the original author)
@ovr do you remember any of the previous history here?
🤔
13ce6f5
to
01a69b2
Compare
01a69b2
to
76e5030
Compare
@alamb updated the if clause with dialects and added a test for ClickHouse's array function (renaming the file in the process due to a typo 😶). Please re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great -- thanks for sticking with it @MazterQyou
@@ -121,6 +121,25 @@ fn parse_array_expr() { | |||
) | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 Very nice @MazterQyou
Can drop this after rebase on commit 31ba001 "Support PostgreSQL array subquery constructor (apache#566)", first released in 0.21.0
Can drop this after rebase on commit 31ba001 "Support PostgreSQL array subquery constructor (apache#566)", first released in 0.21.0
This PR adds support for PostgreSQL array subquery constructor (e.g.
SELECT ARRAY(SELECT 1 UNION SELECT 2)
), as well as a related test.