[BUG] q9 regression between 21.10 and 21.12 #4281

abellina · 2021-12-03T05:58:50Z

We had a baseline of 25s for q9 in 21.10, but we have lost around 1.5 or 2 seconds in 21.12 when running in the spark2a environment.

This is to investigate the differences and figure out if we need to raise an issue in cuDF, the plugin or if there's an environmental issue.

"queryTimes" : [ 26977 ],
"queryTimes" : [ 27164 ],
"queryTimes" : [ 27016 ],
"queryTimes" : [ 26903 ],
"queryTimes" : [ 26542 ],

abellina · 2021-12-03T16:05:11Z

Note that we have identified other issues with q9 that could yield significant performance improvements: #4189, #4186, #4164.

The issue here is somewhat separate from the other issues, but there may be some parallels.

jbrennan333 · 2021-12-04T22:59:14Z

I believe the difference here may be due to enabling nvcomp compression/decompression by default in CUDF. In branch-22.02, I got times similar to 21.10 when I disabled nvcomp:

spark.executorEnv.LIBCUDF_NVCOMP_POLICY=OFF

Results:

q9-22.02-nonvcomp/tpcds-testjtb_no_nvcomp-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638656812726.json:  "queryTimes" : [ 24593 ],
q9-22.02-nonvcomp/tpcds-testjtb_no_nvcomp-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638656886471.json:  "queryTimes" : [ 25379 ],
q9-22.02-nonvcomp/tpcds-testjtb_no_nvcomp-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638657043868.json:  "queryTimes" : [ 25976 ],

When I run branch-22.02 with default setting (LIBCUDF_NVCOMP_POLICY=STABLE).
I get:

q9-22.02/tpcds-testjtb-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638658371600.json:  "queryTimes" : [ 27203 ],
q9-22.02/tpcds-testjtb-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638658437224.json:  "queryTimes" : [ 27284 ],
q9-22.02/tpcds-testjtb-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638658657588.json:  "queryTimes" : [ 27032 ],

jbrennan333 · 2021-12-04T23:11:37Z

This is the PR for enabling nvcomp in cudf: rapidsai/cudf#9582

abellina · 2021-12-06T14:22:51Z

Same comment as I had for q88: #4280 (comment)

jbrennan333 · 2021-12-06T22:20:13Z

As noted in #4280, the change to nvcomp as the default snappy compressor is in branch-21.12.

jbrennan333 · 2022-01-04T22:31:35Z

I think the difference reported here is explained by the change to make nvcomp the default snappy compressor in branch-21.12. The nvcomp team is investigating that issue separately. I think we can close this issue - any objections @abellina or @jlowe?

jlowe · 2022-01-04T22:46:15Z

I think we can close this issue

No objection from me.

abellina · 2022-01-04T22:47:44Z

+1 closing this.

abellina added bug Something isn't working ? - Needs Triage Need team to review and classify performance A performance related task/issue P0 Must have for release labels Dec 3, 2021

Salonijain27 removed the ? - Needs Triage Need team to review and classify label Dec 7, 2021

abellina closed this as completed Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] q9 regression between 21.10 and 21.12 #4281

[BUG] q9 regression between 21.10 and 21.12 #4281

abellina commented Dec 3, 2021

abellina commented Dec 3, 2021

jbrennan333 commented Dec 4, 2021

jbrennan333 commented Dec 4, 2021

abellina commented Dec 6, 2021 •

edited

Loading

jbrennan333 commented Dec 6, 2021

jbrennan333 commented Jan 4, 2022

jlowe commented Jan 4, 2022

abellina commented Jan 4, 2022

[BUG] q9 regression between 21.10 and 21.12 #4281

[BUG] q9 regression between 21.10 and 21.12 #4281

Comments

abellina commented Dec 3, 2021

abellina commented Dec 3, 2021

jbrennan333 commented Dec 4, 2021

jbrennan333 commented Dec 4, 2021

abellina commented Dec 6, 2021 • edited Loading

jbrennan333 commented Dec 6, 2021

jbrennan333 commented Jan 4, 2022

jlowe commented Jan 4, 2022

abellina commented Jan 4, 2022

abellina commented Dec 6, 2021 •

edited

Loading