Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add delimiter parameter to cudf::strings::capitalize() #8620

Merged

Conversation

davidwendt
Copy link
Contributor

Closes #8597

Add a parameter to the cudf::strings::capitalize() function to support capitalizing characters after a specified delimiter. This should meet the requirements of #8597 by passing a single ' ' character string as follows:

    auto results = cudf::strings::capitalize(strings_view, std::string(" "));

The new parameter has a default of empty string that keeps the current behavior so no updates are required to python/cython layer. The new parameter requires this PR to be a breaking change.

The source code for title() and capitalize() were further refactored using the CRTP pattern since much of the main internal code logic is the same.

This PR also includes additional gtests for this new parameter as well as adding some missing tests for empty columns.

@davidwendt davidwendt added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) breaking Breaking change labels Jun 28, 2021
@davidwendt davidwendt self-assigned this Jun 28, 2021
@davidwendt davidwendt requested a review from a team as a code owner June 28, 2021 17:24
@davidwendt davidwendt requested review from ttnghia and jrhemstad June 28, 2021 17:24
@davidwendt
Copy link
Contributor Author

@jlowe Can you verify this API is sufficient for #8597 ?

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify this API is sufficient for #8597 ?

Yes, this should be sufficient to match the behavior of Spark's initcap function. At first I was worried that Spark's toTitleCase() method, which is used by initcap, changes characters to uppercase and never lowercase any characters in the string. However the initcap implementation lowercases the string before calling toTitleCase(), so we should be good as far as matching the initcap behavior.

cpp/include/cudf/strings/capitalize.hpp Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jun 28, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@dab8a62). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #8620   +/-   ##
===============================================
  Coverage                ?   10.09%           
===============================================
  Files                   ?      109           
  Lines                   ?    19610           
  Branches                ?        0           
===============================================
  Hits                    ?     1980           
  Misses                  ?    17630           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dab8a62...4c973d6. Read the comment docs.

cpp/include/cudf/strings/capitalize.hpp Show resolved Hide resolved
cpp/src/strings/capitalize.cu Outdated Show resolved Hide resolved
@davidwendt
Copy link
Contributor Author

rerun tests

@davidwendt
Copy link
Contributor Author

rerun tests

@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit a0b0eab into rapidsai:branch-21.08 Jul 1, 2021
@davidwendt davidwendt deleted the capitalize-with-delimiter branch July 1, 2021 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Ability to capitalize first letter of words separated by a space in a string
3 participants