Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST]How to improve performance of read_parquet and url_decode further ? #7571

Closed
chenrui17 opened this issue Mar 11, 2021 · 6 comments
Closed
Labels
libcudf Affects libcudf (C++/CUDA) code. question Further information is requested

Comments

@chenrui17
Copy link
Contributor

What is your question?
image
@jlowe As discussed at the meeting yesterday, from this nsight trace, we can see make_strings_column cost almost half of the time , rest time is url_decode and replace .
By the way, my input parquet record is long url , length of url about 500~1500. so this problem is depends on #7545 , right ?

@chenrui17 chenrui17 added Needs Triage Need team to review and classify question Further information is requested labels Mar 11, 2021
@jrhemstad
Copy link
Contributor

I think this is a duplicate of #7545. We know there's room to improve the make_strings_column implementation.

@jlowe
Copy link
Member

jlowe commented Mar 11, 2021

Agree mostly this is a duplicate of #7545 but spending 600+ msec on a urldecode isn't ideal either. We may need to see if there's additional optimizations that can be done there.

@chenrui17 what version (or git commit hash) of cudf is this trace based upon?

@chenrui17
Copy link
Contributor Author

Agree mostly this is a duplicate of #7545 but spending 600+ msec on a urldecode isn't ideal either. We may need to see if there's additional optimizations that can be done there.

@chenrui17 what version (or git commit hash) of cudf is this trace based upon?
cudf 0.19 commit id : c76949e

rapids-bot bot pushed a commit that referenced this issue Mar 18, 2021
Reference #7571 
This improves the performance of the `cudf::make_strings_column` for long strings. It uses a similar approach from `cudf::strings::detail::gather` and also use thresholding as in the optimized `cudf::strings::replace`. 
This may not be the right solution for overall optimizing #7571 but may be helpful in other places where long strings are used for created a strings column in libcudf.
This PR also includes a gbenchmark to help measure the performance results of this factory function. The results of the benchmark are that longer strings (~ >64 bytes on average) showed about a 10x improvement. I can post benchmark results here if needed. The character-parallel algorithm was slower for shorter strings so the existing algorithm is used based on the a threshold calculation.
I also added an additional gtest with a mixture of nulls and empty strings to make sure the new algorithm handles these correctly.

Authors:
  - David (@davidwendt)

Approvers:
  - Jason Lowe (@jlowe)
  - Nghia Truong (@ttnghia)
  - Jake Hemstad (@jrhemstad)

URL: #7576
@kkraus14
Copy link
Collaborator

@jlowe is this addressed now based on the make_strings_column optimizations?

@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Mar 26, 2021
@jlowe
Copy link
Member

jlowe commented Mar 26, 2021

make_strings_column and url_decode are significantly faster on long strings, but we may need some further optimizations for url_decode in some cases. I'm not sure if the trace at the top of this description was captured before or after the recent url_decode optimizations.

@chenrui17 please verify whether the new performance of Parquet load and url_decode meets your needs. If there's additional work needed for just one, I propose tracking that with updated trace/metrics in a new feature request.

@chenrui17
Copy link
Contributor Author

i will propose a new feature request to improve url_decode further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libcudf Affects libcudf (C++/CUDA) code. question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants