Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize URL Decoding #8622

Merged
merged 24 commits into from
Aug 30, 2021
Merged

Conversation

gaohao95
Copy link
Contributor

@gaohao95 gaohao95 commented Jun 29, 2021

This PR is intended to optimize the URL decoding performance, especially on large URLs. Additionally, a test case for large URLs has been added.

When tested on V100, baseline performance at 7521c3f

------------------------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------
UrlDecode<10>/url_decode_10pct/100000000/10/manual_time        111 ms          111 ms            6 bytes_per_second=11.7959G/s
UrlDecode<10>/url_decode_10pct/10000000/100/manual_time        107 ms          107 ms            7 bytes_per_second=9.0136G/s
UrlDecode<10>/url_decode_10pct/1000000/1000/manual_time        107 ms          107 ms            7 bytes_per_second=8.76755G/s
UrlDecode<50>/url_decode_50pct/100000000/10/manual_time        129 ms          129 ms            5 bytes_per_second=10.144G/s
UrlDecode<50>/url_decode_50pct/10000000/100/manual_time        126 ms          126 ms            6 bytes_per_second=7.70821G/s
UrlDecode<50>/url_decode_50pct/1000000/1000/manual_time        122 ms          122 ms            6 bytes_per_second=7.66783G/s

This PR

------------------------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------
UrlDecode<10>/url_decode_10pct/100000000/10/manual_time       97.5 ms         97.6 ms            7 bytes_per_second=13.3669G/s
UrlDecode<10>/url_decode_10pct/10000000/100/manual_time       28.8 ms         28.8 ms           24 bytes_per_second=33.6024G/s
UrlDecode<10>/url_decode_10pct/1000000/1000/manual_time       21.8 ms         21.8 ms           32 bytes_per_second=42.9686G/s
UrlDecode<50>/url_decode_50pct/100000000/10/manual_time        109 ms          109 ms            6 bytes_per_second=11.9786G/s
UrlDecode<50>/url_decode_50pct/10000000/100/manual_time       30.2 ms         30.3 ms           23 bytes_per_second=32.0311G/s
UrlDecode<50>/url_decode_50pct/1000000/1000/manual_time       22.7 ms         22.8 ms           31 bytes_per_second=41.1086G/s

close #8030

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jun 29, 2021
@codecov
Copy link

codecov bot commented Jun 29, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@a153493). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 0d64559 differs from pull request most recent head 81af59c. Consider uploading reports for the commit 81af59c to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #8622   +/-   ##
===============================================
  Coverage                ?   10.75%           
===============================================
  Files                   ?      114           
  Lines                   ?    18695           
  Branches                ?        0           
===============================================
  Hits                    ?     2010           
  Misses                  ?    16685           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a153493...81af59c. Read the comment docs.

@shwina
Copy link
Contributor

shwina commented Jul 13, 2021

Hi @gaohao95 -- do you think this PR will be completed in time for 21.08 code freeze (July 22nd?). Wondering where this should go on the project board. Thanks!

@gaohao95
Copy link
Contributor Author

Hi @gaohao95 -- do you think this PR will be completed in time for 21.08 code freeze (July 22nd?). Wondering where this should go on the project board. Thanks!

Hi @shwina, this PR is motivated by spark's string performance and I think the PR itself is more-or-less ready. But the last time I shared the code with @chenrui17, they reported a unit test failure on their ends. After that, I have discovered and fixed a bug, but I am unsure whether the fix solves their test failures. I think we should merge this only after we hear back from them.

Loop in @jlowe and @chenrui17 to see whether they have any feedbacks on priority.

@gaohao95 gaohao95 changed the base branch from branch-21.08 to branch-21.10 July 20, 2021 18:25
@gaohao95 gaohao95 marked this pull request as ready for review July 22, 2021 22:54
@gaohao95 gaohao95 requested a review from a team as a code owner July 22, 2021 22:54
@gaohao95 gaohao95 requested review from vyasr and ttnghia and removed request for a team July 22, 2021 22:54
@gaohao95
Copy link
Contributor Author

@chenrui17 reported that the unit test can pass at their end now. I think this PR is ready for review.

@gaohao95 gaohao95 changed the title Draft: Optimize URL Decoding Optimize URL Decoding Jul 22, 2021
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, the perf improvements look great. Some minor comments, a few of which are more for my edification than anything else.

cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
@vyasr vyasr added 0 - Waiting on Author Waiting for author to respond to review improvement Improvement / enhancement to an existing function labels Jul 28, 2021
@vyasr vyasr added non-breaking Non-breaking change Performance Performance related issue labels Jul 28, 2021
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
@gaohao95 gaohao95 requested review from vyasr and davidwendt August 9, 2021 22:04
@gaohao95
Copy link
Contributor Author

gaohao95 commented Aug 9, 2021

Hi @vyasr @davidwendt, I think all comments have been addressed. Could you take another look to see whether it's ready to merge?

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel logic handling of null entries needs to be corrected.

cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
cpp/src/strings/convert/convert_urls.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

@vyasr
Copy link
Contributor

vyasr commented Aug 30, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 4945198 into rapidsai:branch-21.10 Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Waiting on Author Waiting for author to respond to review improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Improve url_decode performance further
5 participants