-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] InputFileBlock walks through broadcasts and does not deal with mismatched broadcasts #9473
Labels
bug
Something isn't working
Comments
revans2
added
bug
Something isn't working
? - Needs Triage
Need team to review and classify
labels
Oct 18, 2023
jlowe
changed the title
[BUG] InputFileBlock wlaks through broadcasts and does not deal with mismatched broadcasts
[BUG] InputFileBlock walks through broadcasts and does not deal with mismatched broadcasts
Oct 18, 2023
I should add also that it just does not deal with joins either. It can get errors where one half of the join falls back to the CPU and the other half does not because they are not together in a group. Each branch is treated separately. |
Tracking fix here: #9469 |
1 task
winningsix
pushed a commit
to winningsix/spark-rapids
that referenced
this issue
Nov 20, 2023
…s] (NVIDIA#9673) InputFileBlockRule may change the meta of a broadcast join and its child plans, and this change may break the rule of the broadcast join running on GPU, leading to errors. Because GPU broadcast joins require the build side BroadcastExchangeExec running on GPU, and similarly if BroadcastExchangeExec runs on CPU, the broadcast joins should also run on CPU. Change made: Optimize the InputFileBlockRule by skipping the BroadcastExchangeLike because the file info cannot come from a broadcast. (This idea is from NVIDIA#9473) Check the tagging for broadcast joins again after applying the InputFileBlockRule to fix the potential break. Some API refactor, moving all input file related methods into the InputFileBlockRule object. --------- Signed-off-by: Firestarman <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
I am not sure if this is one thing or two things.
#3333 (comment)
caused InputFileBlock to be implemented as a rule to prevent a serious bug. But as a part of debugging #9469 I saw that we would walk through a BroadcastExchangeExec and put all of it on the CPU even if it could be on the GPU.
So I decided to fix it.
But that caused another problem to happen.
This is because the rules we have in place for a join and a broadcast exchange do not run when/if the join is put back on the CPU as a part of the InputFileBlock rule. If I disable the BroadcastHashJoin with
spark.conf.set("spark.rapids.sql.exec.BroadcastHashJoinExec", false)
then it works without errors. So we need a way to make sure that the proper rules run if we remove the BroadcastHashJoin from the GPU. The reason I didn't do it right away, was because I don't know all of the AQE situations that might happen to cause this too. If the broadcast became a broadcast because of AQE I don't understand enough of that to make sure that it is doing what we want/expect.The text was updated successfully, but these errors were encountered: