-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU fallback for Map scalars with key vectors #6288
CPU fallback for Map scalars with key vectors #6288
Conversation
This change fixes the failure of queries where a scalar map is looked up via a vector of keys. Rather than fall back to CPU, such queries used to evade detection during planning, and failed at runtime, as described in NVIDIA#6282: ``` java.lang.IllegalStateException: Map lookup keys must be scalar values at org.apache.spark.sql.rapids.GpuGetMapValue.doColumnar(complexTypeExtractors.scala:231) at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$3(GpuExpressions.scala:258) at com.nvidia.spark.rapids.Arm.withResourceIfAllowed(Arm.scala:73) at com.nvidia.spark.rapids.Arm.withResourceIfAllowed$(Arm.scala:71) at org.apache.spark.sql.rapids.GpuGetMapValue.withResourceIfAllowed(complexTypeExtractors.scala:190) at com.nvidia.spark.rapids.GpuBinaryExpression.$anonfun$columnarEval$2(GpuExpressions.scala:253) at com.nvidia.spark.rapids.Arm.withResourceIfAllowed(Arm.scala:73) at com.nvidia.spark.rapids.Arm.withResourceIfAllowed$(Arm.scala:71) ... ``` This change allows the query plan to correctly fall back to CPU mode. (This change also includes fixes for incorrect whitespaces and indentation errors in `map_test.py`.) Signed-off-by: MithunR <[email protected]>
Build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Very minor nit: Because there are two unrelated changes (whitespace fixes + bug fix), splitting these into separate commits would have made the review slightly easier.
Point taken. I was conflicted about combining them in the PR. Putting the changes in separate commits would've been a good compromise. I'll bear this in mind next time. Thanks for the review and guidance, @andygrove. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the formatting generally should be in a separate PR or at least a separate commit. I also am a bit curious about why we could not just turn the scalar into a vector and do the lookup. We do that in other places. I know it is not ideal from a memory standpoint, but it works.
I thought about that as well. I didn't think it viable since it would be blowing up an I'd be happy to do that in 22.10, if you think that's wise. It'll save my writing a |
No I just think we should file a follow on issue to look at it. I mostly agree with you that this is a rare enough case that not doing it now is probably best, but it would be interesting to look at how we could do it efficiently in the future, when a customer asks for it, and I know that someone will. One idea I had, was because it is a scalar map, we could always turn the map into an nested set of if/else like operations. But there could be a lot of other better ways to do it. |
Will do. I'll merge this change, to restore behaviour, and file another to track adding support for map scalars. |
I have filed #6296 to add support. We'll fall back to CPU in the 22.08 release. |
This PR is merged. Thank you, @andygrove and @revans2. |
This change fixes the failure of queries where a scalar map is
looked up via a vector of keys. Rather than fall back to CPU,
such queries used to evade detection during planning, and failed
at runtime, as described in #6282:
This change allows the query plan to correctly fall back to
CPU mode.
(This change also includes fixes for incorrect whitespaces
and indentation errors in
map_test.py
.)Signed-off-by: MithunR [email protected]