-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added in basic support for broadcast nested loop join #296
Conversation
build |
...in/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastNestedLoopJoinExec.scala
Outdated
Show resolved
Hide resolved
Took a quick glance, seems fine to me. As you noted, would be great to explore sharing a lot of the boilerplate build type handling, output distribution, etc. that is common with the existing hash join. |
build |
I updated the code so I also filed a follow on issue #302 to try and fix some of the memory issues and let us turn this on by default. |
[auto-merge] bot-auto-merge-branch-22.06 to branch-22.08 [skip ci] [bot]
I would appreciate some reviews on this.
It is a part of #265 but is missing
CartesianExec
which is an implementation that shows up if one of the tables is too larger to be broadcast and it is aninner
orcross
join with no equality comparison.It adds in support for Cross equality joins, that in those cases are the same as an Inner join.
It also add in support for BroadcastNestedLoopJoin on Cross and Inner joins. The biggest issue is the amount of memory that could be used by a Cross join this big.
I plan on trying to use the current memory size of each table (left and right) to decide if we should play some games with memory. If the size is too large then we break the tables down into smaller pieces.
i.e.
This would still not fix all cases, because we could broadcast something really large and blow up from just trying to hold it in memory.