-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework specific cudf::strings and nvtext APIs to use make_strings_children utility #12167
Closed
13 tasks done
Labels
improvement
Improvement / enhancement to an existing function
libcudf
Affects libcudf (C++/CUDA) code.
Comments
davidwendt
added
libcudf
Affects libcudf (C++/CUDA) code.
tech debt
improvement
Improvement / enhancement to an existing function
labels
Nov 16, 2022
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Nov 23, 2022
…h to new header (#12185) Create new `cudf/strings/detail/strings_children.cuh` header by moving strings children functions from `utilities.cuh`. This will help with refactoring needed for #12167 and #12180 No code has changed. The `make_strings_children` and `make_offsets_child_column` have moved requiring updates to include statements in source files that included `cudf/strings/detail/utilities.cuh`. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Robert Maynard (https://github.com/robertmaynard) URL: #12185
This was referenced Nov 23, 2022
rapids-bot bot
pushed a commit
that referenced
this issue
Dec 1, 2022
) Rework `cudf::strings::pad` and `cudf::strings::zfill` to use the internal `make_strings_children`. This simplifies the code and allows the operation to throw an error if it the output would exceed the size limit of a column. No function has been added, removed, or changed. Reference #12167 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - MithunR (https://github.com/mythrocks) URL: #12238
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Dec 7, 2022
Rework `nvtext::detokenize` to use the `cudf::detail::make_strings_children` and the `cudf::detail::indexalator`. This allows the operation to throw an error if it the output would exceed the size limit of a column. The indexalator usage removes the need for a type-dispatcher call for the row-indices. No function has been added, removed, or changed. Reference #12167 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: #12267
3 tasks
3 tasks
3 tasks
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Dec 15, 2022
…12317) Rework `cudf::strings::from_timstamps` to use the internal `make_strings_children`. This simplifies the code and allows the operation to throw an error if it the output would exceed the size limit of a column. No function has been added, removed, or changed. Reference #12167 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #12317
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Dec 20, 2022
) Issue #12167 Use `make_strings_children` in `cudf::strings::from_booleans` to simplify code and make the function more robust. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - David Wendt (https://github.com/davidwendt) - Divye Gala (https://github.com/divyegala) URL: #12365
rapids-bot bot
pushed a commit
that referenced
this issue
Jan 3, 2023
…tility (#12401) Rework `cudf::strings::integers_to_ipv4` to use the internal `make_strings_children`, `count_digits`, and `integer_to_string` functions. This simplifies the code and allows the operation to throw an error if it the output would exceed the size limit of a column. Reference #12167 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Nghia Truong (https://github.com/ttnghia) URL: #12401
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Jan 4, 2023
…#12385) Rework `cudf::strings::url_encode` to use the internal `make_strings_children`. This simplifies the code and allows the operation to throw an error if it the output would exceed the size limit of a column. Also fixed a bug found by inspection in the capitalize functor. Reference #12167 Authors: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) Approvers: - Nghia Truong (https://github.com/ttnghia) - Bradley Dice (https://github.com/bdice) URL: #12385
3 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Jan 6, 2023
…#12480) Rework `nvtext::generate_character_ngrams` to use the internal `make_strings_children`. This simplifies the code and allows the operation to throw an error if it the output would exceed the size limit of a column. Reference #12167 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Divye Gala (https://github.com/divyegala) - Nghia Truong (https://github.com/ttnghia) URL: #12480
rapids-bot bot
pushed a commit
that referenced
this issue
Jan 7, 2023
…s_children (#12434) Issue #12167 Use `make_strings_children` in `cudf::strings::from_fixed_point`, `cudf::strings::from_floats` and `cudf::strings::from_integers` to simplify code and make the functions more robust. Authors: - Vukasin Milovanovic (https://github.com/vuule) Approvers: - David Wendt (https://github.com/davidwendt) - MithunR (https://github.com/mythrocks) - Nghia Truong (https://github.com/ttnghia) URL: #12434
rapids-bot bot
pushed a commit
that referenced
this issue
Jan 9, 2023
Adds the `cudf::sizes_to_offsets` utility to the `cudf::strings::detail::make_offsets_child_column` utility. Strings APIs use this when `make_strings_children` is not practical. Also lists APIs use this for building offsets too. Reference: #12167 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #12345
rapids-bot bot
pushed a commit
that referenced
this issue
Jan 10, 2023
Use `make_strings_children` utility in parse_data nested json reader Addresses part of #12167 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - David Wendt (https://github.com/davidwendt) - Vukasin Milovanovic (https://github.com/vuule) URL: #12382
This was referenced Jan 12, 2023
rapids-bot bot
pushed a commit
that referenced
this issue
Jan 17, 2023
#12541) Rework `cudf::lists::sequences` to use the internal `sizes_to_offsets` utility instead of `thrust::exclusive_scan` for computing offsets. This allows the operation to throw an error if it the output would exceed the size limit of a column. Reference #12167 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) URL: #12541
rapids-bot bot
pushed a commit
that referenced
this issue
Jan 17, 2023
Rework `nvtext::ngrams_tokenize` to use the internal `sizes_to_offsets` utility. This simplifies the code and allows the operation to throw an error if it the output would exceed the size limit of a column. Reference #12167 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #12540
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
improvement
Improvement / enhancement to an existing function
libcudf
Affects libcudf (C++/CUDA) code.
The fix for #12087 is expected to go into the
cudf::strings::detail::make_strings_children
utility which will throw an error if it detects the output column will be too large.There are currently 86
cudf::strings
APIs with 55 of those creating new strings though only 30 of those can possibly create larger strings than its input. 18 of those 30 use themake_strings_children
utility and will be automatically covered by the fix for #12087. And 2 of the 30 are already have builtin logic to handle any possible overflow. Similarly,nvtext
has 3 APIs not covered.This means there are 13 APIs that should either use the
make_strings_children
utility or apply the same fix. This issue is to keep track of the work for fixing these APIs.cudf::strings::from_booleans
Rework cudf::strings::from_booleans to use make_strings_children #12365cudf::strings::from_timestamps
Rework cudf::strings::from_timestamps to use make_strings_children #12317cudf::strings::from_durations
Rework remaining cudf::strings::from_xyz functions to use make_strings_children #12434cudf::strings::from_fixed_point
Rework remaining cudf::strings::from_xyz functions to use make_strings_children #12434cudf::strings::from_floats
Rework remaining cudf::strings::from_xyz functions to use make_strings_children #12434cudf::strings::from_integers
Rework remaining cudf::strings::from_xyz functions to use make_strings_children #12434cudf::strings::integers_to_ipv4
Rework cudf::strings::integers_to_ipv4 to use make_strings_children utility #12401cudf::strings::url_encode
Rework cudf::strings::url_encode to use make_strings_children utility #12385cudf::strings::pad
cudf::strings::zfill
nvtext::generate_character_ngrams
Rework nvtext::generate_character_ngrams to use make_strings_children #12480nvtext::ngrams_tokenize
Rework nvtext::ngrams_tokenize to use sizes-to-offsets utility #12540nvtext::detokenize
The text was updated successfully, but these errors were encountered: