Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update calls to make_strings_children to support large strings #15579

Closed
33 tasks done
davidwendt opened this issue Apr 22, 2024 · 0 comments
Closed
33 tasks done

Update calls to make_strings_children to support large strings #15579

davidwendt opened this issue Apr 22, 2024 · 0 comments
Assignees
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.

Comments

@davidwendt
Copy link
Contributor

davidwendt commented Apr 22, 2024

This issue to help keep track of the work needed to move existing calls to cudf::strings::detail::make_strings_children to the new cudf::strings::detail::experimental::make_strings_children and then ultimately replacing the non-experimental one.
Available once #15363 is merged.

The changes involve updating the functor to used by the utility to replace the size_type* d_offsets member with a size_type* d_sizes since currently the functors set the output row sizes there on the first pass call. And then a new input_offsetalalor d_offsets member is added to now address the output row's data in the existing char* d_chars device memory. This allows the d_chars data to point to larger than 2GB of device memory and for the offsets to be either INT32 or INT64. This effort should be minimal on each functor since the actual output size and memory writes only need to use the new members correctly and so no significant logic changes should be needed.

Right now, no additional updates will be required including benchmarks or gtests though this may change for individual APIs in the future.

APIs that use make_strings_children and need to be reworked to use the experimental make_strings_children

Convert non-string to string - #15629

  • from_booleans
  • from_timestamps
  • from_durations
  • from_fixed_point
  • from_floats
  • integers_to_hex
  • from_integers
  • integers_to_ipv4

Other conversions -- PR #15598

  • format_list_column
  • url_encode
  • join_strings
  • join_list_elements
  • slice_strings

Replace/Filter -- PR #15586

  • replace (string parallel)
  • replace (multiple targets)
  • replace_slice
  • filter_characters_of_type
  • filter_characters
  • translate

Others - PR #15587

  • pad
  • zfill
  • to_lower/to_upper/swapcase
  • capitalize

I/O - PR #15599

  • JSON writer (get_escaped_strings)
  • JSON benchmark (build_json_string_column)
  • CSV writer (escaping characters)

nvtext - PR #15595

  • filter_tokens
  • replace_tokens
  • normalize_characters
  • normalize_spaces
  • generate_character_ngrams
  • generate_ngrams
  • detokenize
@davidwendt davidwendt added libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function labels Apr 22, 2024
@davidwendt davidwendt self-assigned this Apr 22, 2024
rapids-bot bot pushed a commit that referenced this issue Apr 30, 2024
…nslate (#15586)

Updates strings replace functions to use the new experimental `make_strings_children` which supports building large strings.

Reference #15579

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Karthikeyan (https://github.com/karthikeyann)

URL: #15586
rapids-bot bot pushed a commit that referenced this issue Apr 30, 2024
…ice (#15598)

Updates strings APIs to use the new experimental `make_strings_children` which supports building large strings.
- `cudf::strings::join_strings`
- `cudf::strings::join_list_elements`
- `cudf::strings::slice_strings`
- `cudf::strings::format_list_column`
- `cudf::strings::url_encode`

Reference #15579

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mike Wilson (https://github.com/hyperbolic2346)

URL: #15598
rapids-bot bot pushed a commit that referenced this issue Apr 30, 2024
…ons (#15587)

Updates strings case conversion and pad functions to use the new experimental `make_strings_children` which supports building large strings.

Reference #15579

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Bradley Dice (https://github.com/bdice)

URL: #15587
rapids-bot bot pushed a commit that referenced this issue May 1, 2024
Updates the JSON and CSV writer functions to use the new experimental make_strings_children.
Also included is an update to the JSON_BENCH benchmark for get_json_object.

Reference #15579

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - Bradley Dice (https://github.com/bdice)

URL: #15599
rapids-bot bot pushed a commit that referenced this issue May 1, 2024
Updates nvtext replace, ngram, normalize, and detokenize functions to replace the existing calls to `make_strings_children` with the new experimental `make_strings_children` which supports building large strings.

Reference #15579

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Bradley Dice (https://github.com/bdice)

URL: #15595
rapids-bot bot pushed a commit that referenced this issue May 3, 2024
Updates strings convert functions to use the new experimental `make_strings_children` which supports building large strings.

Reference #15579

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Paul Mattione (https://github.com/pmattione-nvidia)
  - Bradley Dice (https://github.com/bdice)

URL: #15629
rapids-bot bot pushed a commit that referenced this issue May 8, 2024
Updates multi-pattern version of `cudf::strings::replace_re` to use the new experimental `make_strings_children` which supports building large strings.

Reference #15579

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - MithunR (https://github.com/mythrocks)

URL: #15667
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

1 participant