-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add python/cython bindings for str.join
API
#8085
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.20 #8085 +/- ##
=============================================
Coverage 82.88% 82.89%
=============================================
Files 103 103
Lines 17668 17849 +181
=============================================
+ Hits 14645 14796 +151
- Misses 3023 3053 +30
Continue to review full report at Codecov.
|
# If self._column is not a ListColumn, we will have to | ||
# split each row by character and create a ListColumn out of it. | ||
strings_column = self._split_by_character() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really expensive both computation and memory wise. We may want to raise an issue for a future optimization to prevent us from having to materialize the offsets here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a FEA: #8094 and added a todo here.
@gpucibot merge |
This PR adds a benchmark to the current `tokenize_benchmark.cpp` to measure the `nvtext::character_tokenize` API. PR #8085 added code for using the `nvtext::character_tokenize` function. The benchmark was also useful while investigating #8094. Also found and removed an unused variable in the code logic. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Nghia Truong (https://github.com/ttnghia) URL: #8125
Resolves #8079
This PR:
concatenate_list_elements
in cython and plumbs it to our python API,.str.join
str.join
.