-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-6201] [SQL] promote string and do widen types for IN #4945
Conversation
Test build #28382 has finished for PR 4945 at commit
|
case i @ In(a, b) if b.exists(_.dataType == StringType) | ||
&& a.dataType.isInstanceOf[NumericType] => | ||
i.makeCopy(Array(a, b.map(_.dataType match{ | ||
case StringType => Cast(a, DoubleType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Causes unmatched exception?
case StringType => Cast(a, DoubleType)
case x => x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
Test build #28420 has finished for PR 4945 at commit
|
@liancheng That's reasonable, not every string could be converted into numeric types.
|
What is the status here? I haven't looked closely but this seems reasonable to me. However, I'd like to see more hive comparison tests that specifically test edge cases (different sized numbers, strings that can't be converted to numbers, etc) to make sure we are compatible with what they are doing. |
The thing that makes me hesitant here is whether we should stick to Hive, because Hive's behavior is actually error prone and unintuitive. In Hive, Take Personally I think maybe we should just throw an exception if the left side of |
Any comments on @liancheng 's suggestion? |
If we don't like what hive does here, we can do it in MySQL way: Convert all expressions in the list to the type of left handle of IN. In fact the main differences here is that Spark SQL promotes strings to numeric when do type coercion, while Hive seems do the contrary. |
The mysql way seems reasonable to me. @liancheng ? |
@adrian-wang @marmbrus Sorry for the late reply. Yeah, the MySQL way also seems reasonable to me. In both Spark SQL and MySQL, |
Test build #30732 has finished for PR 4945 at commit
|
Test build #30734 has finished for PR 4945 at commit
|
Test build #31058 has started for PR 4945 at commit |
Thanks. I'm going to merge this. |
huangjs Acutally spark sql will first go through analysis period, in which we do widen types and promote strings, and then optimization, where constant IN will be converted into INSET. So it turn out that we only need to fix this for IN. Author: Daoyuan Wang <[email protected]> Closes #4945 from adrian-wang/inset and squashes the following commits: 71e05cc [Daoyuan Wang] minor fix 581fa1c [Daoyuan Wang] mysql way f3f7baf [Daoyuan Wang] address comments 5eed4bc [Daoyuan Wang] promote string and do widen types for IN (cherry picked from commit c3eb441) Signed-off-by: Yin Huai <[email protected]>
Merged in master and branch 1.4. Thanks! |
huangjs Acutally spark sql will first go through analysis period, in which we do widen types and promote strings, and then optimization, where constant IN will be converted into INSET. So it turn out that we only need to fix this for IN. Author: Daoyuan Wang <[email protected]> Closes apache#4945 from adrian-wang/inset and squashes the following commits: 71e05cc [Daoyuan Wang] minor fix 581fa1c [Daoyuan Wang] mysql way f3f7baf [Daoyuan Wang] address comments 5eed4bc [Daoyuan Wang] promote string and do widen types for IN
huangjs Acutally spark sql will first go through analysis period, in which we do widen types and promote strings, and then optimization, where constant IN will be converted into INSET. So it turn out that we only need to fix this for IN. Author: Daoyuan Wang <[email protected]> Closes apache#4945 from adrian-wang/inset and squashes the following commits: 71e05cc [Daoyuan Wang] minor fix 581fa1c [Daoyuan Wang] mysql way f3f7baf [Daoyuan Wang] address comments 5eed4bc [Daoyuan Wang] promote string and do widen types for IN
huangjs Acutally spark sql will first go through analysis period, in which we do widen types and promote strings, and then optimization, where constant IN will be converted into INSET. So it turn out that we only need to fix this for IN. Author: Daoyuan Wang <[email protected]> Closes apache#4945 from adrian-wang/inset and squashes the following commits: 71e05cc [Daoyuan Wang] minor fix 581fa1c [Daoyuan Wang] mysql way f3f7baf [Daoyuan Wang] address comments 5eed4bc [Daoyuan Wang] promote string and do widen types for IN
@huangjs
Acutally spark sql will first go through analysis period, in which we do widen types and promote strings, and then optimization, where constant IN will be converted into INSET.
So it turn out that we only need to fix this for IN.