-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Improve url_decode performance further #8030
Comments
@chenrui17 it would be good to get a better sense of the input that is triggering the poor behavior.
Hopefully the URL decode benchmarks can be updated accordingly to reproduce the behavior shown in the trace, and then the code can be optimized against that benchmark. |
The average length of each string is 800 (max lenth is 4686 ) , about 200 escape sequences occur per string on average ( max is about 1500) In addition, i insert some time clock code in function |
@jlowe Attach url_decode benchmark result. I guess the problem is my input parquet file row count is too big, and it's about 1,800,000, and there are about 20000 files. Running ./cpp/build/gbenchmarks/STRINGS_BENCH
|
This issue has been labeled |
Still relevant |
This PR is intended to optimize the URL decoding performance, especially on large URLs. Additionally, a test case for large URLs has been added. When tested on V100, baseline performance at 7521c3f ``` ------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------------ UrlDecode<10>/url_decode_10pct/100000000/10/manual_time 111 ms 111 ms 6 bytes_per_second=11.7959G/s UrlDecode<10>/url_decode_10pct/10000000/100/manual_time 107 ms 107 ms 7 bytes_per_second=9.0136G/s UrlDecode<10>/url_decode_10pct/1000000/1000/manual_time 107 ms 107 ms 7 bytes_per_second=8.76755G/s UrlDecode<50>/url_decode_50pct/100000000/10/manual_time 129 ms 129 ms 5 bytes_per_second=10.144G/s UrlDecode<50>/url_decode_50pct/10000000/100/manual_time 126 ms 126 ms 6 bytes_per_second=7.70821G/s UrlDecode<50>/url_decode_50pct/1000000/1000/manual_time 122 ms 122 ms 6 bytes_per_second=7.66783G/s ``` This PR ``` ------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------------ UrlDecode<10>/url_decode_10pct/100000000/10/manual_time 97.5 ms 97.6 ms 7 bytes_per_second=13.3669G/s UrlDecode<10>/url_decode_10pct/10000000/100/manual_time 28.8 ms 28.8 ms 24 bytes_per_second=33.6024G/s UrlDecode<10>/url_decode_10pct/1000000/1000/manual_time 21.8 ms 21.8 ms 32 bytes_per_second=42.9686G/s UrlDecode<50>/url_decode_50pct/100000000/10/manual_time 109 ms 109 ms 6 bytes_per_second=11.9786G/s UrlDecode<50>/url_decode_50pct/10000000/100/manual_time 30.2 ms 30.3 ms 23 bytes_per_second=32.0311G/s UrlDecode<50>/url_decode_50pct/1000000/1000/manual_time 22.7 ms 22.8 ms 31 bytes_per_second=41.1086G/s ``` close #8030 Authors: - https://github.com/gaohao95 Approvers: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) - Vyas Ramasubramani (https://github.com/vyasr) URL: #8622
Is your feature request related to a problem? Please describe.
#7571 follow-up , url_decode performance still not ideal, gpu performance V.S. cpu performance is basically flat, and gpu util is very high, and there is still a lot of room for url_decode optimization.
Describe the solution you'd like
Here is the query trace which is contains url_decode.
Describe alternatives you've considered
None
Additional context
None
The text was updated successfully, but these errors were encountered: