-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add list output option to character_ngrams() function #9499
Add list output option to character_ngrams() function #9499
Conversation
|
||
# convert the output to a list by just generating the | ||
# offsets for the output list column | ||
s1 = self.len() - (n - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe mypy complains here because the return type of self.len()
is a Union[Series, BaseIndex]
, and BaseIndex
does not support a __sub__()
method.
@vyasr do you have any thoughts on how to resolve? (other than a type: ignore
of course :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the typing really accurate? Shouldn't it return Series for Series of strings or Int64Index for any index type? This seems like an opportunity to tighten up our annotations.
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9499 +/- ##
================================================
- Coverage 10.79% 10.66% -0.13%
================================================
Files 116 117 +1
Lines 18869 19723 +854
================================================
+ Hits 2036 2104 +68
- Misses 16833 17619 +786
Continue to review full report at Codecov.
|
@davidwendt Your examples are always amazing |
@gpucibot merge |
Closes #8190
Depends on #9498
Adds an
as_list
output option to the cudfcharacter_ngrams()
API. The generated ngrams can be grouped within list offsets.Example 1
Strings too small for the ngram size result in empty list rows.
And as always, null rows result in null rows.
Example 2