-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add regex flags parameter to python cudf strings split #10185
Add regex flags parameter to python cudf strings split #10185
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10185 +/- ##
================================================
+ Coverage 10.42% 10.63% +0.20%
================================================
Files 119 122 +3
Lines 20603 20954 +351
================================================
+ Hits 2148 2228 +80
- Misses 18455 18726 +271
Continue to review full report at Codecov.
|
@shwina I added support for regex in |
@shwina Just wanted to check with you on this before merging. |
@gpucibot merge |
Closes #3584
This depends on libcudf changes in PR #10128
This adds the regex parameter to the cudf strings
split()
function similar to the 1.4.0 Pandas one documented here.The main difference is that the
pat
parameter will only be interpreted as regex if thepat
string has more than 1 character and theregex
parameter is set toTrue
. This is to help with consistency and migration from the previous implementation.The 1.3.x Pandas version does not have a
regex
parameter forsplit()
but instead will try to interpret the intention of thepat
parameter without it. This seems a bit dangerous since regex would be much slower for us here. Therefore, theregex
parameter is required to be set toTrue
in the cudf implementation in order to use the regex logic path.Pandas does not support regex for its
rsplit
even though it has been documented and there is an issue here.