-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add separator-on-null parameter to strings concatenate APIs #8282
Add separator-on-null parameter to strings concatenate APIs #8282
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.06 #8282 +/- ##
===============================================
Coverage ? 82.88%
===============================================
Files ? 105
Lines ? 17874
Branches ? 0
===============================================
Hits ? 14814
Misses ? 3060
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake changes look good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmake / python lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested this latest patch out (along with pr 8285) and all my Spark tests pass so we have the desired null handling behavior.
@gpucibot merge |
to stringConcatenate when using a scalar separator. Reference #8282 changed to throw an exception if only a single column is passed in to the stringConcatenate using scalar separator. Update our Java test for that functionality. Signed-off-by: Thomas Graves <[email protected]> Authors: - Thomas Graves (https://github.com/tgravescs) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Jason Lowe (https://github.com/jlowe) URL: #8330
This PR implements a new option for `strings::join_list_elements` on top of #8282. In particular, the new option is: ``` /** * @brief Setting for specifying what will be output from `join_list_elements` when an input list * is empty. */ enum class output_if_empty_list { EMPTY_STRING, ///< Empty list will result in empty string NULL_ELEMENT ///< Empty list will result in a null }; ``` This new option is necessary for implementing `concat_ws` in Spark, since the behavior of the output string is required to be different depending on the situation. Currently blocked from merging by #8282. Authors: - Nghia Truong (https://github.com/ttnghia) - David Wendt (https://github.com/davidwendt) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Keith Kraus (https://github.com/kkraus14) - Mike Wilson (https://github.com/hyperbolic2346) - David Wendt (https://github.com/davidwendt) - GALI PREM SAGAR (https://github.com/galipremsagar) - Ashwin Srinath (https://github.com/shwina) URL: #8285
Add java api's to be able to call string concatenate with separators. DO NOT MERGE until: #8285 and #8282 are merged. We need those changes for the API used in this PR as well as functionality to match what Spark needs. new arguments were added to the existing concatenate api that takes a scalar for the separator, so I extended that. I added new api's for the concatenate api that takes a column as the separator. I also added new api's for both join_list_elements api's, one with scalar separator and one with column separator. Authors: - Thomas Graves (https://github.com/tgravescs) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Jason Lowe (https://github.com/jlowe) URL: #8289
Closes #4728
This PR adds a new parameter to the
cudf::strings::concatenate
APIs to specify if separators should be added between null entries when the null-replacement (narep) parameter is valid. If the narep scalar is invalid (i.e. null itself) then the entire output row becomes null. If not, separators are added between each element. Examples:The new parameter is an enum
separator_on_nulls
which hasYES
orNO
settings. The default parameter value will beYES
to keep the current behavior as expected by Python cudf and for consistency with Pandas behavior.Specifying
NO
here will suppress the separator with null elements (when narep is valid).This PR also changes the name of the
cudf::strings::concatenate_list_elements
API tocudf::strings::join_list_elements
instead. The API pattern and behavior more mimic thecudf::strings::join_strings
then the concatenate functions. Also, these are called by the Pythonjoin
functions so the rename makes it more consistent with cudf.This is a breaking change in order to make these APIs more consistent. Previously, the separators column version was returning nulls only for an all-null row. This has been changed to honor the
separator_on_null
parameter instead. Currently there was no Python cudf API calling this version. Only the rename required minor changes to the Cython layer.The gtests were updated to reflect the new behavior. None of the pytests required any changes since the default parameter value matches the original behavior for those APIs that cudf actually calls.