Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate cudf::strings::repeat_strings_output_sizes in favor of cudf::repeat_strings throwing size limit error #12542

Closed
davidwendt opened this issue Jan 12, 2023 · 2 comments · Fixed by #12609
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)

Comments

@davidwendt
Copy link
Contributor

The cudf::strings::repeat_strings_output_sizes was created so the output sizes of the cudf::strings::repeat_strings API could be pre-computed and checked for possible overflow. The cudf::strings::repeat_strings can now throw an exception if the output size will exceed the column limit size so this API is no longer needed.

The following API can be removed: https://docs.rapids.ai/api/libcudf/stable/group__strings__copy.html#gad137eea9f5189b3e8c6384c6ab8def40
The optional parameter for this API can be removed: https://docs.rapids.ai/api/libcudf/stable/group__strings__copy.html#ga89b08a3e8f941f7760087cb9e5bf9758

@revans2

@davidwendt davidwendt added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) labels Jan 12, 2023
@davidwendt davidwendt self-assigned this Jan 12, 2023
@ttnghia
Copy link
Contributor

ttnghia commented Jan 13, 2023

I'll work on Spark plugin to adapt the changes addressed by this issue.

@ttnghia
Copy link
Contributor

ttnghia commented Jan 13, 2023

@davidwendt When you are working on this, please merge if after #12546 to avoid breaking in Java build. Thanks.

rapids-bot bot pushed a commit that referenced this issue Jan 18, 2023
Rework `cudf::strings::repeat_strings` to use the internal `sizes_to_offsets` utility. This allows the operation to throw an error if it the output would exceed the size limit of a column. It will also allow deprecating and removing this `repeat_strings_output_sizes` function per issue #12542

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - MithunR (https://github.com/mythrocks)

URL: #12543
rapids-bot bot pushed a commit that referenced this issue Jan 20, 2023
…#12546)

As described in #12542, the function is no longer needed. Thus, this PR removes all its related callers.

Depends on (so that `strings::repeat_strings` will have the ability to detect output overflow):
 * #12543
 * NVIDIA/spark-rapids#7513

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)

URL: #12546
rapids-bot bot pushed a commit that referenced this issue Feb 8, 2023
…ter from cudf::strings::repeat_strings (#12609)

Removes `cudf::strings::repeat_strings_output_sizes` and the optional sizes parameter from `cudf::strings::repeat_strings`.
This function (and corresponding optional parameter) is no longer needed now that the internal utilities will throw an error if the column output size exceeds the maximum.
Closes #12542

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - Bradley Dice (https://github.com/bdice)

URL: #12609
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)
Projects
None yet
2 participants