-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support GpuHashJoin on Structs #2173
Conversation
Signed-off-by: sperlingxx <[email protected]>
Signed-off-by: sperlingxx <[email protected]>
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuHashJoin.scala
Show resolved
Hide resolved
Signed-off-by: sperlingxx <[email protected]>
Looks really good just the one nit and then we can merge this in once the final fix is in place on the CUDF side. |
Signed-off-by: sperlingxx <[email protected]>
build |
Signed-off-by: sperlingxx <[email protected]>
Signed-off-by: sperlingxx <[email protected]>
build |
3 similar comments
build |
build |
build |
The fix on the CUDF side is done. And all checks of CI pipeline are happy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had one really minor style nit, I don't expect you to fix it. It is small enough I am just going to ignore it.
val unSupportNonEqualCondition = () => if (condition.isDefined) { | ||
meta.willNotWorkOnGpu(s"$joinType joins currently do not support conditions") | ||
} | ||
val unSupportStructKeys = () => if (keyDataTypes.exists(_.isInstanceOf[StructType])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
I personally find it much more readable to do a def instead of a val for functions like this.
def unSupportNonEqualCondition:
...
def unSupportStructKeys:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
You had to do the build 4 times to get it to pass. Why exactly did https://blossom.nvidia.com/sw-gpu-spark-jenkins/job/rapids_premerge-github/1454/ fail (with what appears to be a join issue) and what changed to make it pass? I don't see anything that changed from CI. Which makes me really nervous. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing my approval until I understand what happened with CI
...park300/src/main/scala/com/nvidia/spark/rapids/shims/spark300/GpuBroadcastHashJoinExec.scala
Show resolved
Hide resolved
Signed-off-by: sperlingxx <[email protected]>
The CI failed several times due to this cuDF bug. I wasn't aware of this bug until I found the CI still failed after multiple attempts. (I just thought it is because the cuDF Jar hasn't been updated to latest master. Therefore, I tried to rerun CI hours later.) Then, I looked into the error log and found the problem. |
build |
Signed-off-by: sperlingxx <[email protected]>
Signed-off-by: sperlingxx <[email protected]>
Signed-off-by: sperlingxx [email protected]
Current PR is to partially support GPU equal join (GpuHashJoin) on structure data (#2126). There are some constraints in current PR:
Left anti and left semi join on structs have yet supported in libcudf (#7912). Full outer is not supported because of #7934 and #7947.SortExec
is not supported with nested structs.