-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support reverse #6885
Comments
I don't see any existing cudf functionality that will do what we want. @viadea do you want revers for string, array, or both? We will need separate kernels or APIs for each of these. This is because just reversing the bits in a UTF-8 string will not reverse the string because strings are multi-byte. |
@revans2 String is good for now. i can show you a real example offline |
In cudf string is just a special case of array: array of chars. So we can support both at the same time. |
Some example:
|
I was worried if |
The problem is that strings are stored in UTF-8, which supports multi-byte characters. ASCII we can probably do it very simply, but for anything that is not ASCII it will potentially corrupt the string. |
I see-so the same implementation can't be used for both, but the implementation should be very similar and straightforward. |
For a list reverse a gather map would likely work well. It would be even better if you could not have to materialize the gather map, because for some list types the gather map is going to be larger than the list column itself. But that is probably a minor optimization. If you look at https://github.com/rapidsai/cudf/blob/branch-23.02/cpp/include/cudf/strings/detail/utf8.hpp it provides APIs to be able to see if the current byte is part of a multi-byte character or not, and what the length of it is. If you can find a good way to do this type of thing in parallel, then great. But from what I have seen most of the string APIs work by having a single thread per string. Even then we might be able to play some games to coalesce reads and writes to speed things up. |
We also have a duplicate FEA issue: #4375. |
Adds `cudf::strings::reverse` function. This is to support NVIDIA/spark-rapids#6885 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Nghia Truong (https://github.com/ttnghia) - Christopher Harris (https://github.com/cwharris) - Jake Awe (https://github.com/AyodeAwe) - Bradley Dice (https://github.com/bdice) URL: #12227
Also depends on rapidsai/cudf#12283 |
This implements `lists::reverse` that output a lists column in which each list is generated by reversing the order of the elements in the corresponding input list. Example: ``` s = [ [1, 2, 3], [], null, [4, 5, null] ] r = reverse(s) r is now [ [3, 2, 1], [], null, [null, 5, 4] ] ``` This is to support NVIDIA/spark-rapids#4375 and NVIDIA/spark-rapids#6885. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) - Jason Lowe (https://github.com/jlowe) URL: #12336
The libcudf PR for array input type was merged (rapidsai/cudf#12336). We can continue to complete our plugin work. |
I wish we can support reverse function.
The text was updated successfully, but these errors were encountered: