-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delimiter parameter to cudf::strings::capitalize() #8620
Add delimiter parameter to cudf::strings::capitalize() #8620
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify this API is sufficient for #8597 ?
Yes, this should be sufficient to match the behavior of Spark's initcap
function. At first I was worried that Spark's toTitleCase()
method, which is used by initcap
, changes characters to uppercase and never lowercase any characters in the string. However the initcap
implementation lowercases the string before calling toTitleCase()
, so we should be good as far as matching the initcap
behavior.
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8620 +/- ##
===============================================
Coverage ? 10.09%
===============================================
Files ? 109
Lines ? 19610
Branches ? 0
===============================================
Hits ? 1980
Misses ? 17630
Partials ? 0 Continue to review full report at Codecov.
|
rerun tests |
rerun tests |
@gpucibot merge |
Closes #8597
Add a parameter to the
cudf::strings::capitalize()
function to support capitalizing characters after a specified delimiter. This should meet the requirements of #8597 by passing a single ' ' character string as follows:The new parameter has a default of empty string that keeps the current behavior so no updates are required to python/cython layer. The new parameter requires this PR to be a breaking change.
The source code for
title()
andcapitalize()
were further refactored using the CRTP pattern since much of the main internal code logic is the same.This PR also includes additional gtests for this new parameter as well as adding some missing tests for empty columns.