Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-4652][VL] Fix min_by/max_by result mismatch when RDD partition num > 1 #5711

Merged
merged 3 commits into from
May 13, 2024

Conversation

zhouyifan279
Copy link
Contributor

@zhouyifan279 zhouyifan279 commented May 12, 2024

What changes were proposed in this pull request?

A follow-up of PR #5544 to fix min_by/max_by result mismatch when RDD partition num > 1.

Run SQL:

set spark.sql.leafNodeDefaultParallelism=2;
select min_by(a, b), max_by(a, b) from values (5, 6), (null, 11), (null, 5) test(a, b);

Current result is

5      5

Expected result is

NULL    NULL

How was this patch tested?

Modified test case VeloxAggregateFunctionsSuite - min_by/max_by

Copy link

#4652

@yma11
Copy link
Contributor

yma11 commented May 13, 2024

I think I used to test the "spark.default.parallelism", "2" in previous fix and it worked. So what's the root cause of the failure as you tested? which part is the corresponding fix?

@zhouyifan279
Copy link
Contributor Author

zhouyifan279 commented May 13, 2024

@yma11 Did you reproduced the problem I mentioned above based on the master gluten branch?
That is what I'm trying to fix.

@zhouyifan279
Copy link
Contributor Author

zhouyifan279 commented May 13, 2024

I think I used to test the "spark.default.parallelism", "2" in previous fix and it worked. So what's the root cause of the failure as you tested? which part is the corresponding fix?

@yma11 I guess you didn't reproduce the problem because there was only 1 parquet file under test table dir.
In that case, only 1 task existed in the Spark aggregation stage and the result was correct.

@yma11
Copy link
Contributor

yma11 commented May 13, 2024

@yma11 Did you reproduced the problem I mentioned above based on the master gluten branch? That is what I'm trying to fix.

I mean which part of your code changes in this PR is the fix? Seems only some code refactor?

@zhouyifan279
Copy link
Contributor Author

zhouyifan279 commented May 13, 2024

@yma11 Did you reproduced the problem I mentioned above based on the master gluten branch? That is what I'm trying to fix.

I mean which part of your code changes in this PR is the fix? Seems only some code refactor?

Before this PR, RowConstructorWithAllNullCallToSpecialForm.constructSpecialForm(the one defined in RowConstructorWithAllNull.h) was not called when function name was row_constructor_with_all_null.
I think it was related to C++ overload/override mechanism.

In this PR, I take function name as a member of RowConstructorWithNullCallToSpecialForm to avoid the overload/override issues and fix the result mismatch.

@yma11
Copy link
Contributor

yma11 commented May 13, 2024

@yma11 Did you reproduced the problem I mentioned above based on the master gluten branch? That is what I'm trying to fix.

I mean which part of your code changes in this PR is the fix? Seems only some code refactor?

Before this PR, RowConstructorWithAllNullCallToSpecialForm.constructSpecialForm(the one defined in RowConstructorWithAllNull.h) was not called when function name was row_constructor_with_all_null. I think it was related to C++ overload/override mechanism.

In this PR, I take function name as a member of RowConstructorWithNullCallToSpecialForm to avoid the overload/override issues and fix the result mismatch.

That's interesting. Actually, I verified the previous code using real workload but with single parquet file so may not trigger your scenario. Thanks for your fix!

@zhouyifan279
Copy link
Contributor Author

@yma11 CI run-spark-test-spark34's failure seems irrelevant with this PR. Can you rerun it?

@ulysses-you ulysses-you merged commit 33f9935 into apache:main May 13, 2024
41 checks passed
@zhouyifan279 zhouyifan279 deleted the min_max_by branch May 13, 2024 14:20
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_master_05_13_2024_time.csv log/native_master_05_11_2024_4899ea5c8_time.csv difference percentage
q1 14.93 14.75 -0.181 98.79%
q2 13.61 17.57 3.956 129.06%
q3 4.67 4.36 -0.310 93.36%
q4 63.56 61.96 -1.597 97.49%
q5 6.45 7.89 1.446 122.43%
q6 4.14 4.31 0.169 104.09%
q7 5.78 5.89 0.115 101.99%
q8 3.23 3.14 -0.089 97.24%
q9 20.14 17.64 -2.506 87.56%
q10 10.40 9.35 -1.052 89.89%
q11 36.53 40.33 3.795 110.39%
q12 1.37 2.81 1.438 205.12%
q13 5.67 5.93 0.258 104.55%
q14a 42.91 43.48 0.572 101.33%
q14b 42.68 42.11 -0.566 98.67%
q15 2.41 2.47 0.054 102.23%
q16 38.87 39.07 0.200 100.51%
q17 5.15 5.98 0.833 116.17%
q18 7.14 6.12 -1.021 85.69%
q19 3.08 2.02 -1.060 65.53%
q20 1.52 1.39 -0.129 91.47%
q21 1.08 1.00 -0.072 93.30%
q22 8.31 11.63 3.319 139.94%
q23a 81.56 79.90 -1.661 97.96%
q23b 101.64 105.25 3.607 103.55%
q24a 78.23 81.05 2.818 103.60%
q24b 72.61 76.38 3.764 105.18%
q25 5.88 5.97 0.093 101.58%
q26 2.55 4.38 1.833 171.90%
q27 2.96 3.02 0.057 101.93%
q28 22.91 22.59 -0.320 98.60%
q29 8.19 6.84 -1.358 83.42%
q30 4.20 4.07 -0.131 96.87%
q31 7.38 5.82 -1.568 78.76%
q32 0.96 1.07 0.109 111.35%
q33 5.93 7.28 1.353 122.81%
q34 4.97 5.28 0.310 106.23%
q35 6.58 9.42 2.842 143.23%
q36 2.97 3.01 0.042 101.40%
q37 3.79 3.93 0.139 103.66%
q38 15.46 12.10 -3.353 78.31%
q39a 3.16 3.35 0.193 106.12%
q39b 2.72 2.70 -0.025 99.10%
q40 3.96 4.06 0.109 102.74%
q41 0.56 0.54 -0.022 96.16%
q42 0.79 0.79 -0.003 99.61%
q43 3.58 3.50 -0.083 97.69%
q44 6.74 6.96 0.216 103.21%
q45 3.40 3.28 -0.115 96.61%
q46 2.93 4.17 1.247 142.61%
q47 13.95 14.50 0.552 103.95%
q48 4.21 4.09 -0.120 97.15%
q49 11.08 7.28 -3.801 65.69%
q50 21.52 23.39 1.866 108.67%
q51 8.29 8.38 0.097 101.17%
q52 0.91 0.96 0.052 105.68%
q53 1.63 1.66 0.027 101.66%
q54 3.03 2.97 -0.065 97.86%
q55 0.89 0.91 0.018 102.02%
q56 5.49 5.34 -0.153 97.22%
q57 8.17 8.27 0.103 101.26%
q58 2.34 2.39 0.056 102.39%
q59 15.67 17.40 1.727 111.02%
q60 5.66 5.81 0.153 102.70%
q61 6.28 6.17 -0.109 98.27%
q62 4.53 3.91 -0.614 86.43%
q63 1.82 1.85 0.029 101.56%
q64 48.88 49.77 0.893 101.83%
q65 14.68 13.73 -0.947 93.55%
q66 2.82 2.86 0.033 101.16%
q67 349.14 351.37 2.236 100.64%
q68 3.65 3.59 -0.063 98.29%
q69 9.48 6.37 -3.110 67.19%
q70 8.49 9.30 0.810 109.55%
q71 2.03 3.06 1.027 150.55%
q72 189.72 186.98 -2.733 98.56%
q73 2.06 2.17 0.109 105.26%
q74 21.11 21.07 -0.035 99.83%
q75 24.75 25.14 0.390 101.58%
q76 10.26 7.57 -2.688 73.80%
q77 1.60 1.78 0.174 110.84%
q78 38.47 39.89 1.417 103.68%
q79 3.36 3.40 0.038 101.14%
q80 11.26 11.10 -0.163 98.55%
q81 5.82 4.27 -1.548 73.40%
q82 6.69 8.91 2.220 133.19%
q83 1.40 1.32 -0.073 94.75%
q84 2.88 3.06 0.185 106.42%
q85 6.77 7.26 0.487 107.19%
q86 3.24 3.24 0.003 100.10%
q87 12.10 12.13 0.031 100.26%
q88 16.59 17.29 0.705 104.25%
q89 2.58 4.61 2.034 179.00%
q90 3.06 3.10 0.039 101.27%
q91 5.66 2.85 -2.814 50.32%
q92 1.23 1.23 0.006 100.53%
q93 30.20 29.83 -0.371 98.77%
q94 21.20 21.35 0.147 100.69%
q9 87.60 84.69 -2.912 96.68%
q5 2.23 2.55 0.318 114.24%
q96 12.16 12.41 0.251 102.07%
q97 1.92 1.87 -0.051 97.33%
q98 9.47 8.29 -1.181 87.53%
q99 9.47 8.29 -1.181 87.53%
total 1890.24 1902.59 12.348 100.65%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_05_13_2024_time.csv log/native_master_05_11_2024_4899ea5c8_time.csv difference percentage
q1 36.42 33.75 -2.676 92.65%
q2 23.70 23.55 -0.142 99.40%
q3 36.61 38.11 1.498 104.09%
q4 37.83 37.13 -0.695 98.16%
q5 70.62 68.76 -1.853 97.38%
q6 7.47 7.57 0.103 101.38%
q7 85.08 82.51 -2.566 96.98%
q8 84.30 82.99 -1.311 98.45%
q9 121.00 120.65 -0.349 99.71%
q10 45.37 45.36 -0.002 100.00%
q11 19.49 20.15 0.660 103.38%
q12 26.52 30.08 3.562 113.43%
q13 55.08 53.02 -2.058 96.26%
q14 21.53 20.74 -0.789 96.34%
q15 29.02 30.86 1.840 106.34%
q16 14.00 13.92 -0.083 99.41%
q17 103.62 101.26 -2.360 97.72%
q18 146.24 146.46 0.225 100.15%
q19 13.35 15.45 2.101 115.73%
q20 29.06 26.42 -2.643 90.91%
q21 282.54 284.10 1.553 100.55%
q22 14.61 16.40 1.786 112.23%
total 1303.44 1299.24 -4.200 99.68%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants