Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] q9 regression between 21.10 and 21.12 #4281

Closed
abellina opened this issue Dec 3, 2021 · 8 comments
Closed

[BUG] q9 regression between 21.10 and 21.12 #4281

abellina opened this issue Dec 3, 2021 · 8 comments
Labels
bug Something isn't working P0 Must have for release performance A performance related task/issue

Comments

@abellina
Copy link
Collaborator

abellina commented Dec 3, 2021

We had a baseline of 25s for q9 in 21.10, but we have lost around 1.5 or 2 seconds in 21.12 when running in the spark2a environment.

This is to investigate the differences and figure out if we need to raise an issue in cuDF, the plugin or if there's an environmental issue.

"queryTimes" : [ 26977 ],
"queryTimes" : [ 27164 ],
"queryTimes" : [ 27016 ],
"queryTimes" : [ 26903 ],
"queryTimes" : [ 26542 ],

@abellina abellina added bug Something isn't working ? - Needs Triage Need team to review and classify performance A performance related task/issue P0 Must have for release labels Dec 3, 2021
@abellina
Copy link
Collaborator Author

abellina commented Dec 3, 2021

Note that we have identified other issues with q9 that could yield significant performance improvements: #4189, #4186, #4164.

The issue here is somewhat separate from the other issues, but there may be some parallels.

@jbrennan333
Copy link
Contributor

I believe the difference here may be due to enabling nvcomp compression/decompression by default in CUDF. In branch-22.02, I got times similar to 21.10 when I disabled nvcomp:

spark.executorEnv.LIBCUDF_NVCOMP_POLICY=OFF

Results:

q9-22.02-nonvcomp/tpcds-testjtb_no_nvcomp-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638656812726.json:  "queryTimes" : [ 24593 ],
q9-22.02-nonvcomp/tpcds-testjtb_no_nvcomp-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638656886471.json:  "queryTimes" : [ 25379 ],
q9-22.02-nonvcomp/tpcds-testjtb_no_nvcomp-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638657043868.json:  "queryTimes" : [ 25976 ],

When I run branch-22.02 with default setting (LIBCUDF_NVCOMP_POLICY=STABLE).
I get:

q9-22.02/tpcds-testjtb-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638658371600.json:  "queryTimes" : [ 27203 ],
q9-22.02/tpcds-testjtb-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638658437224.json:  "queryTimes" : [ 27284 ],
q9-22.02/tpcds-testjtb-gpu-aqe-on-ucx-off-16-cores-decimals-false-q9-1638658657588.json:  "queryTimes" : [ 27032 ],

@jbrennan333
Copy link
Contributor

This is the PR for enabling nvcomp in cudf: rapidsai/cudf#9582

@abellina
Copy link
Collaborator Author

abellina commented Dec 6, 2021

Same comment as I had for q88: #4280 (comment)

@jbrennan333
Copy link
Contributor

As noted in #4280, the change to nvcomp as the default snappy compressor is in branch-21.12.

@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Dec 7, 2021
@jbrennan333
Copy link
Contributor

I think the difference reported here is explained by the change to make nvcomp the default snappy compressor in branch-21.12. The nvcomp team is investigating that issue separately. I think we can close this issue - any objections @abellina or @jlowe?

@jlowe
Copy link
Contributor

jlowe commented Jan 4, 2022

I think we can close this issue

No objection from me.

@abellina
Copy link
Collaborator Author

abellina commented Jan 4, 2022

+1 closing this.

@abellina abellina closed this as completed Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

4 participants