Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for string udfs in libcudf #10686

Closed

Conversation

davidwendt
Copy link
Contributor

Reference #9639

Add basic functions and a dynamic device string class in libcudf to support user-defined functions through numba. The device string class dstring manages local device memory to manipulate UTF-8 encoded string data in device code. The column functions help create and destroy an array of dstring objects and string_view objects for use within a numba created device kernel.

A follow-on PR will include Cython interfaces for the column functions.

Depends on #10684

@davidwendt davidwendt added feature request New feature or request 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) non-breaking Non-breaking change labels Apr 19, 2022
@davidwendt davidwendt self-assigned this Apr 19, 2022
@github-actions github-actions bot added the CMake CMake build issue label Apr 19, 2022
Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncomfortable with putting this class in libcudf. I had thought this was going to live somewhere else given that is solely for a numba feature, especially when it is a non-standard dstring type thing that is kinda-sorta like std::string but not all the way. imo, doing this "right" would be figuring out how to implement cuda::std::string or cuda::string in libcu++.

Requesting changes on this for now until we've had a chance to talk about this more.

@codecov
Copy link

codecov bot commented Apr 19, 2022

Codecov Report

Merging #10686 (2f658e9) into branch-22.06 (01d08af) will increase coverage by 0.03%.
The diff coverage is n/a.

❗ Current head 2f658e9 differs from pull request most recent head 4695697. Consider uploading reports for the commit 4695697 to get more accurate results

@@               Coverage Diff                @@
##           branch-22.06   #10686      +/-   ##
================================================
+ Coverage         86.35%   86.38%   +0.03%     
================================================
  Files               142      142              
  Lines             22335    22335              
================================================
+ Hits              19287    19294       +7     
+ Misses             3048     3041       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/string.py 89.22% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.64% <0.00%> (+0.23%) ⬆️
python/cudf/cudf/core/column/numerical.py 96.17% <0.00%> (+0.29%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 92.79% <0.00%> (+1.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5e6941...4695697. Read the comment docs.

@davidwendt
Copy link
Contributor Author

@brandon-b-miller and I discussed this a bit last week. We need the dstring class for Python strings UDF work but I think it is a matter of where it should be placed in the repo. It intended to be used only by the Numba internal implementation and we do not expect it to be used directly right now. That said, it does not seem right for the public namespace but also not for the detail namespace.

Anyway, we are reconsidering moving this into cudf right now in favor of trying to build a solution in the string_udf repo instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants