-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] batch coalesce can fail to appear between columnar shuffle and subsequent columnar operation #698
Comments
cc: @andygrove |
I'm pretty sure I know what is causing this. When planning a new shuffle query stage, we do remove the |
@JustPlay could you confirm which version of the plugin you are using (perhaps the commit hash that you built from) |
I have tested tpc-ds query 10, 33, 18, when AQE=ON bug, when AQE=OFF good spark.sql.adaptive.enabled=true OR false I'm using spark-3.0.1-rc3 (the .zip package) for rapids-0.2: what's more, it's really hart the performance, really |
I have tried the latest version, the BUG still exists |
@JustPlay Thanks for the information. I have a theory on this and it would be useful if you could send me the query plan so that I confirm. I suspect that although you have AQE enabled, the query isn't using AQE due to some restrictions. For example, if your plan starts with an If this is the case you should see a warning similar to "spark.sql.adaptive.enabled is enabled but is not supported for query". I think that there is a bug in the plugin where we would still remove the coalesce from the exchange in this case but won't insert it around the custom shuffle reader because there is no custom shuffle reader due to AQE being disabled. I will be working to confirm this today and come up with a solution. |
@andygrove would this help? OR what you need is something else? |
Thanks @JustPlay but the plan you posted isn't using AQE at all. I can tell because of this:
If the query was using AQE, the physical plan would start with an Are there any warnings in your driver logs about not using AQE for the query? I think this still confirms my theory though, that AQE is enabled but not being used for this specific query for some reason, and our plugin is treating it as an adaptive query, therefore removing the coalesce from the shuffle exchanges. |
I have been able to reproduce the issue now with a query that uses dynamic partition pruning, and therefore disables AQE for the query. It results in the following physical plan, which is missing a coalesce batches operator between the
|
…IDIA#698) Signed-off-by: spark-rapids automation <[email protected]> Signed-off-by: spark-rapids automation <[email protected]>
Describe the bug
Normally a
GpuCoalesceBatches
query plan node should appear after a GPU columnar exchange and before a subsequent GPU operation. In some cases the coalesce node can be missing as noted at #622 (comment)Steps/Code to reproduce bug
Run TPC-DS query 18 with adaptive query execution enabled. Other queries as identified in #622 could also be used.
Expected behavior
A
GpuCoalesceBatches
node must be inserted after every GPU columnar shuffle that is followed by a subsequent GPU operation.The text was updated successfully, but these errors were encountered: