-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strings::join_list_elements
options for empty list inputs
#8285
strings::join_list_elements
options for empty list inputs
#8285
Conversation
# Conflicts: # cpp/src/strings/combine/join_list_elements.cu
strings::join_list_elements
[skip ci]
strings::join_list_elements
[skip ci]strings::join_list_elements
options for empty list inputs [skip ci]
strings::join_list_elements
options for empty list inputs [skip ci]strings::join_list_elements
options for empty list inputs
FWIW, this does not match Pandas' behaviour. Pandas will return In [36]: s
Out[36]:
0 [a, b, c]
1 [a, b, None]
2 [None, None, None]
3 None
dtype: object
In [37]: s.str.join(' ')
Out[37]:
0 a b c
1 NaN
2 NaN
3 None
dtype: object It feels like this might be a pretty a pretty significant edge case. For example, users may want to call to In [38]: s.str.join(' ').dropna()
Out[38]:
0 a b c
dtype: object
# whereas we will return:
# 0 a b c
# 2
# dtype: object Is there any way to preserve the original behaviour from the Python side? Perhaps make the original behaviour opt-in in libcudf? |
The C++ API allows to do that. I will need to change the python interface (it was setup to call C++ API with default parameters to return an empty string on all-null lists). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@gpucibot merge |
@gpucibot merge |
Rerun tests. |
2 similar comments
Rerun tests. |
Rerun tests. |
Rerun tests. |
4 similar comments
Rerun tests. |
Rerun tests. |
Rerun tests. |
Rerun tests. |
rerun tests |
@gpucibot merge |
Codecov Report
@@ Coverage Diff @@
## branch-21.06 #8285 +/- ##
===============================================
Coverage ? 82.88%
===============================================
Files ? 105
Lines ? 17872
Branches ? 0
===============================================
Hits ? 14813
Misses ? 3059
Partials ? 0 Continue to review full report at Codecov.
|
Add java api's to be able to call string concatenate with separators. DO NOT MERGE until: #8285 and #8282 are merged. We need those changes for the API used in this PR as well as functionality to match what Spark needs. new arguments were added to the existing concatenate api that takes a scalar for the separator, so I extended that. I added new api's for the concatenate api that takes a column as the separator. I also added new api's for both join_list_elements api's, one with scalar separator and one with column separator. Authors: - Thomas Graves (https://github.com/tgravescs) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Jason Lowe (https://github.com/jlowe) URL: #8289
This PR implements a new option for
strings::join_list_elements
on top of #8282. In particular, the new option is:This new option is necessary for implementing
concat_ws
in Spark, since the behavior of the output string is required to be different depending on the situation.Currently blocked from merging by #8282.