Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rework logic in cudf::strings::split_record to improve performance (#…
…12729) Updates the `cudf::strings::split_record` logic to match the more optimized code in `cudf::strings:split`. The optimized code performs much better for longer strings (>64 bytes) by parallelizing over the character bytes to find delimiters before determining split tokens. This led to refactoring the code so it both APIs can share the optimized code. Also fixes a bug found when using overlapped delimiters. Additional tests were added for multi-byte delimiters which can overlap and span multiple adjacent strings. Closes #12694 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) - https://github.com/nvdbaranec URL: #12729
- Loading branch information