Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improve performance of copy_if_else for long strings #15014

Closed
revans2 opened this issue Feb 9, 2024 · 0 comments · Fixed by #15017
Closed

[FEA] Improve performance of copy_if_else for long strings #15014

revans2 opened this issue Feb 9, 2024 · 0 comments · Fixed by #15017
Assignees
Labels
feature request New feature or request Performance Performance related issue Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Feb 9, 2024

Is your feature request related to a problem? Please describe.
We have a number of use cases where we do a copy_if_else on long strings. This can end up taking a long time, like in the case of from_json, when the input strings are large it ends up being the biggest kernel by total time. Larger than unsnap to decompress the original input string data. Larger that the tokenization kernels to tokenize the JSON data.

copyu_if_else_string

That first line is the string copy_if_else kernel that took 36.4% of the total kernel time.

When I look at the strings copy_if_else code I see a single thread per string and it ends up doing a memcpy.

memcpy(d_chars + d_offsets[idx], d_str.data(), d_str.size_bytes());

I am not CUDA expert so I could be wrong about all of this, but I think we should be able to detect if the average string size is larger than a specific amount, and do a string per warp or something to help coalesce the memory read and write performance.

@revans2 revans2 added feature request New feature or request Performance Performance related issue Spark Functionality that helps Spark RAPIDS labels Feb 9, 2024
@davidwendt davidwendt self-assigned this Feb 9, 2024
rapids-bot bot pushed a commit that referenced this issue Feb 14, 2024
Reworks the `cudf::strings::detail::copy_if_else()` to improve performance for long strings. The rework builds a vector of rows to pass to the `make_strings_column` factory that uses the optimized `gather_chars` function.
Also includes a benchmark for copy_if_else specifically for strings columns.

Closes #15014

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #15017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Performance Performance related issue Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants