-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement strings::repeat_strings
#8423
Conversation
I think the name |
Yes, it was very difficult to choose a name for this API. You are welcomed to suggest a better name for this. Basically, this API should be named |
Strings are the only supported type, right? |
Yes, this API only supports strings. Should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good.
@gpucibot merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to CMake changes
…rent number of times (#8561) This work is requested from the Spark team, which is also a follow up work on #8423 so that cudf's `strings::repeat_strings` fully supports `StringRepeat` SQL expression in Apache Spark. Note that this API requires to explicitly implement overflow check for the size of the output strings column, as it is not trivial and can't be performed outside of cudf. This PR also rewrites some existing code, including renaming variables and changes in doxygen. ### Follow up works depending on this PR: * Benchmark: #8589 * Java binding: #8572 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Jason Lowe (https://github.com/jlowe) - Conor Hoekstra (https://github.com/codereport) - David Wendt (https://github.com/davidwendt) URL: #8561
This PR implements
strings::repeat_strings
which repeats the given string(s) multiple times. In contrast with the existing APIcudf::repeat
that repeats the rows (copies one row into multiple rows), this new API repeats the string within each row of the given strings column (copies the content of each string multiple times into the output string). For example:This implements cudf side API for NVIDIA/spark-rapids#68.