Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31563][SQL] Fix failure of InSet.sql for collections of Catalyst's internal types #28343

Closed
wants to merge 2 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 25, 2020

What changes were proposed in this pull request?

In the PR, I propose to fix the InSet.sql method for the cases when input collection contains values of internal Catalyst's types, for instance UTF8String. Elements of the input set hset are converted to Scala types, and wrapped by Literal to properly form SQL view of the input collection.

Why are the changes needed?

The changes fixed the bug in InSet.sql that makes wrong assumption about types of collection elements. See more details in SPARK-31563.

Does this PR introduce any user-facing change?

Highly likely, not.

How was this patch tested?

Added a test to ColumnExpressionSuite

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks plausible. This needs to go to 2.4 too right?

@SparkQA
Copy link

SparkQA commented Apr 25, 2020

Test build #121802 has finished for PR 28343 at commit d46c177.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you for splitting your existing PR, @MaxGekk .
Yes, @srowen . This is a fix for all 2.x. So, I asked to split the original PR.
Merged to master/3.0/2.4.

dongjoon-hyun pushed a commit that referenced this pull request Apr 25, 2020
…st's internal types

### What changes were proposed in this pull request?
In the PR, I propose to fix the `InSet.sql` method for the cases when input collection contains values of internal Catalyst's types, for instance `UTF8String`. Elements of the input set `hset` are converted to Scala types, and wrapped by `Literal` to properly form SQL view of the input collection.

### Why are the changes needed?
The changes fixed the bug in `InSet.sql` that makes wrong assumption about types of collection elements. See more details in SPARK-31563.

### Does this PR introduce any user-facing change?
Highly likely, not.

### How was this patch tested?
Added a test to `ColumnExpressionSuite`

Closes #28343 from MaxGekk/fix-InSet-sql.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 7d8216a)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Apr 25, 2020
…st's internal types

In the PR, I propose to fix the `InSet.sql` method for the cases when input collection contains values of internal Catalyst's types, for instance `UTF8String`. Elements of the input set `hset` are converted to Scala types, and wrapped by `Literal` to properly form SQL view of the input collection.

The changes fixed the bug in `InSet.sql` that makes wrong assumption about types of collection elements. See more details in SPARK-31563.

Highly likely, not.

Added a test to `ColumnExpressionSuite`

Closes #28343 from MaxGekk/fix-InSet-sql.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 7d8216a)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

cc @holdenk since she is the release manager for 2.4.6.

@holdenk
Copy link
Contributor

holdenk commented Apr 27, 2020

Thanks everyone :)

@@ -519,7 +520,9 @@ case class InSet(child: Expression, hset: Set[Any]) extends UnaryExpression with

override def sql: String = {
val valueSQL = child.sql
val listSQL = hset.toSeq.map(Literal(_).sql).mkString(", ")
val listSQL = hset.toSeq
.map(elem => Literal(convertToScala(elem, child.dataType)).sql)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this converts the internal value to external value, and then Literal.apply converts external value to internal value.

Can we just do Literal(elem, child.dataType).sql?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan No, Literal fails on UTF8String value, see #28328 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another problem is #28328 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the problem is Literal requires external types but elem has internal Catalyst's type

@MaxGekk MaxGekk deleted the fix-InSet-sql branch June 5, 2020 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants