Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Don't copy data in to/from_dlpack when unnecessary #10874

Open
shwina opened this issue May 17, 2022 · 5 comments
Open

[FEA] Don't copy data in to/from_dlpack when unnecessary #10874

shwina opened this issue May 17, 2022 · 5 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@shwina
Copy link
Contributor

shwina commented May 17, 2022

To .to_dlpack() and .from_dlpack() methods in cuDF currently always perform a copy to/from the DLTensor. This is reasonable for DataFrames, as the columns of a dataframe in cuDF are not contiguous in memory, nor are they always of the same data type.

For a Series however, I believe we should be able to zero-copy to and from DLPack. That is not the case today:

>>> import cudf
>>> import cupy as cp
>>> s = cudf.Series([1, 2, 3])
>>> arr = cp.from_dlpack(s.to_dlpack())
>>> s._column.data.ptr
139742968545280
>>> arr.data.ptr
139742968545792
@shwina shwina added feature request New feature or request Needs Triage Need team to review and classify labels May 17, 2022
@jrhemstad
Copy link
Contributor

jrhemstad commented May 17, 2022

This wouldn't be possible with the existing from_dlpack function in libcudf as it always returns a unique_ptr<column> that expects to own the data.

That said, we could add a view_dlpack function that returns a column_view/table_view.

I think a non-copying to_dlpack could also be possible by adding an overload of to_dlpack that takes a column&& or table&& and therefore takes ownership of those objects and gives it to the returned DLManagedTensor object.

@shwina
Copy link
Contributor Author

shwina commented May 17, 2022

I think a non-copying to_dlpack could also be possible by adding an overload of to_dlpack that takes a column&& or table&& and therefore takes ownership of those objects and gives it to the returned DLManagedTensor object.

I don't think this would work for Python since we never own any column objects ourselves.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@GregoryKimball GregoryKimball removed the Needs Triage Need team to review and classify label Jun 29, 2022
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@GregoryKimball GregoryKimball added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. and removed inactive-30d labels Nov 21, 2022
@vyasr
Copy link
Contributor

vyasr commented May 14, 2024

This work is adjacent to #10849

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

4 participants