-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update calls to make_strings_children to support large strings #15579
Labels
improvement
Improvement / enhancement to an existing function
libcudf
Affects libcudf (C++/CUDA) code.
Comments
davidwendt
added
libcudf
Affects libcudf (C++/CUDA) code.
improvement
Improvement / enhancement to an existing function
labels
Apr 22, 2024
This was referenced Apr 23, 2024
rapids-bot bot
pushed a commit
that referenced
this issue
Apr 30, 2024
…nslate (#15586) Updates strings replace functions to use the new experimental `make_strings_children` which supports building large strings. Reference #15579 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: #15586
rapids-bot bot
pushed a commit
that referenced
this issue
Apr 30, 2024
…ice (#15598) Updates strings APIs to use the new experimental `make_strings_children` which supports building large strings. - `cudf::strings::join_strings` - `cudf::strings::join_list_elements` - `cudf::strings::slice_strings` - `cudf::strings::format_list_column` - `cudf::strings::url_encode` Reference #15579 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Mike Wilson (https://github.com/hyperbolic2346) URL: #15598
rapids-bot bot
pushed a commit
that referenced
this issue
Apr 30, 2024
…ons (#15587) Updates strings case conversion and pad functions to use the new experimental `make_strings_children` which supports building large strings. Reference #15579 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Bradley Dice (https://github.com/bdice) URL: #15587
rapids-bot bot
pushed a commit
that referenced
this issue
May 1, 2024
Updates the JSON and CSV writer functions to use the new experimental make_strings_children. Also included is an update to the JSON_BENCH benchmark for get_json_object. Reference #15579 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Karthikeyan (https://github.com/karthikeyann) - Bradley Dice (https://github.com/bdice) URL: #15599
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
May 1, 2024
Updates nvtext replace, ngram, normalize, and detokenize functions to replace the existing calls to `make_strings_children` with the new experimental `make_strings_children` which supports building large strings. Reference #15579 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #15595
rapids-bot bot
pushed a commit
that referenced
this issue
May 3, 2024
Updates strings convert functions to use the new experimental `make_strings_children` which supports building large strings. Reference #15579 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Paul Mattione (https://github.com/pmattione-nvidia) - Bradley Dice (https://github.com/bdice) URL: #15629
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
May 8, 2024
Updates multi-pattern version of `cudf::strings::replace_re` to use the new experimental `make_strings_children` which supports building large strings. Reference #15579 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Karthikeyan (https://github.com/karthikeyann) - MithunR (https://github.com/mythrocks) URL: #15667
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
improvement
Improvement / enhancement to an existing function
libcudf
Affects libcudf (C++/CUDA) code.
This issue to help keep track of the work needed to move existing calls to
cudf::strings::detail::make_strings_children
to the newcudf::strings::detail::experimental::make_strings_children
and then ultimately replacing the non-experimental one.Available once #15363 is merged.
The changes involve updating the functor to used by the utility to replace the
size_type* d_offsets
member with asize_type* d_sizes
since currently the functors set the output row sizes there on the first pass call. And then a newinput_offsetalalor d_offsets
member is added to now address the output row's data in the existingchar* d_chars
device memory. This allows thed_chars
data to point to larger than 2GB of device memory and for the offsets to be either INT32 or INT64. This effort should be minimal on each functor since the actual output size and memory writes only need to use the new members correctly and so no significant logic changes should be needed.Right now, no additional updates will be required including benchmarks or gtests though this may change for individual APIs in the future.
APIs that use
make_strings_children
and need to be reworked to use the experimentalmake_strings_children
Convert non-string to string - #15629
from_booleans
from_timestamps
from_durations
from_fixed_point
from_floats
integers_to_hex
from_integers
integers_to_ipv4
Other conversions -- PR #15598
format_list_column
url_encode
join_strings
join_list_elements
slice_strings
Replace/Filter -- PR #15586
replace
(string parallel)replace
(multiple targets)replace_slice
filter_characters_of_type
filter_characters
translate
Others - PR #15587
pad
zfill
to_lower/to_upper/swapcase
capitalize
I/O - PR #15599
get_escaped_strings
)build_json_string_column
)nvtext - PR #15595
filter_tokens
replace_tokens
normalize_characters
normalize_spaces
generate_character_ngrams
generate_ngrams
detokenize
The text was updated successfully, but these errors were encountered: