-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31563][SQL] Fix failure of InSet.sql for collections of Catalyst's internal types #28343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks plausible. This needs to go to 2.4 too right?
Test build #121802 has finished for PR 28343 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…st's internal types ### What changes were proposed in this pull request? In the PR, I propose to fix the `InSet.sql` method for the cases when input collection contains values of internal Catalyst's types, for instance `UTF8String`. Elements of the input set `hset` are converted to Scala types, and wrapped by `Literal` to properly form SQL view of the input collection. ### Why are the changes needed? The changes fixed the bug in `InSet.sql` that makes wrong assumption about types of collection elements. See more details in SPARK-31563. ### Does this PR introduce any user-facing change? Highly likely, not. ### How was this patch tested? Added a test to `ColumnExpressionSuite` Closes #28343 from MaxGekk/fix-InSet-sql. Authored-by: Max Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 7d8216a) Signed-off-by: Dongjoon Hyun <[email protected]>
…st's internal types In the PR, I propose to fix the `InSet.sql` method for the cases when input collection contains values of internal Catalyst's types, for instance `UTF8String`. Elements of the input set `hset` are converted to Scala types, and wrapped by `Literal` to properly form SQL view of the input collection. The changes fixed the bug in `InSet.sql` that makes wrong assumption about types of collection elements. See more details in SPARK-31563. Highly likely, not. Added a test to `ColumnExpressionSuite` Closes #28343 from MaxGekk/fix-InSet-sql. Authored-by: Max Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 7d8216a) Signed-off-by: Dongjoon Hyun <[email protected]>
cc @holdenk since she is the release manager for 2.4.6. |
Thanks everyone :) |
@@ -519,7 +520,9 @@ case class InSet(child: Expression, hset: Set[Any]) extends UnaryExpression with | |||
|
|||
override def sql: String = { | |||
val valueSQL = child.sql | |||
val listSQL = hset.toSeq.map(Literal(_).sql).mkString(", ") | |||
val listSQL = hset.toSeq | |||
.map(elem => Literal(convertToScala(elem, child.dataType)).sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this converts the internal value to external value, and then Literal.apply
converts external value to internal value.
Can we just do Literal(elem, child.dataType).sql
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan No, Literal fails on UTF8String value, see #28328 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another problem is #28328 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the problem is Literal requires external types but elem has internal Catalyst's type
What changes were proposed in this pull request?
In the PR, I propose to fix the
InSet.sql
method for the cases when input collection contains values of internal Catalyst's types, for instanceUTF8String
. Elements of the input sethset
are converted to Scala types, and wrapped byLiteral
to properly form SQL view of the input collection.Why are the changes needed?
The changes fixed the bug in
InSet.sql
that makes wrong assumption about types of collection elements. See more details in SPARK-31563.Does this PR introduce any user-facing change?
Highly likely, not.
How was this patch tested?
Added a test to
ColumnExpressionSuite