-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Multiple scans for the same parquet data source #483
Comments
I just did a test on a simpler use case using TPCH data. select * from
table1 t1 inner join table1 t2
on t1.o_orderkey = t2.o_orderkey does not read the data twice. I think it was fixed by #253 |
I'm going to try and run with tpcds as well, but it might take a bit to get everything setup for testing. |
@chenrui17 We'd better write a test case to verify the effect of |
The issue was around canonicalization. We started out using a copy of Canonicalization of the Be aware that there are still a few places where we can run into situations where exchanges that would be deduplicated in a CPU only query will not be with a GPU query. This mostly comes down to what we can and cannot translate to run on the GPU so we can hit situations like with #386 or issues with the plugin not doing a columnar exchange if the downstream part of the operation is not on the GPU too. |
I was able to run q4 but at a smaller scale factor (50), and I verified that the de-duplication is happening properly. @LuciferYang and @chenrui17 is it okay if I close this as fixed in the upcoming 0.2 release? |
@chenrui17 please try it with branch-0.2 and give a feedback as soon as possible, thx ~ |
@LuciferYang I test tpc-ds Query-2 with branch-0.2 in local mode ,This problem has been fixed。 |
Signed-off-by: spark-rapids automation <[email protected]> Signed-off-by: spark-rapids automation <[email protected]>
Describe the bug
i test tpc-ds query-2 , I find two identical operators on DAG graphs ,so it means that read one parquet table twice , lead to poor performance , webui As shown in the figure below :
dag graph as shown in the figure below :
Steps/Code to reproduce bug
my config is :
tpc-ds.q2.config.txt
The text was updated successfully, but these errors were encountered: