Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure objects with __interface__ are converted to cupy/numpy arrays #16436

Merged
merged 8 commits into from
Aug 1, 2024

Conversation

mroeschke
Copy link
Contributor

@mroeschke mroeschke commented Jul 30, 2024

Description

#16277 removed a universal cast to a cupy.array in _from_array. Although the typing suggested this method should only accept np.ndarray or cupy.ndarray, this method is called on any object implementing the __cuda_array_inferface__ or __array_interface__ (e.g. numba.DeviceArray) which caused a performance regression in cuspatial rapidsai/cuspatial#1413

closes #16434

In [1]: import cupy, numba.cuda

In [2]: import cudf

In [3]: cupy_array = cupy.ones((10_000, 100))

In [4]: %timeit cudf.DataFrame(cupy_array)
3.88 ms ± 52 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [5]: %timeit cudf.DataFrame(numba.cuda.to_device(cupy_array))
3.99 ms ± 35.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 30, 2024
@mroeschke mroeschke requested a review from a team as a code owner July 30, 2024 18:40
@mroeschke mroeschke requested review from bdice and galipremsagar July 30, 2024 18:40
python/cudf/cudf/core/dataframe.py Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor

bdice commented Jul 30, 2024

@mroeschke Is there a way we can test this effectively?

@mroeschke
Copy link
Contributor Author

@mroeschke Is there a way we can test this effectively?

I am having trouble thinking how we could unit test this exactly, but I added a benchmark and a mypy type annotation in _from_array so hopefully those two can "test" this by proxy

@GregoryKimball
Copy link
Contributor

@mroeschke Do you think this should go into 24.08?

@mroeschke
Copy link
Contributor Author

@mroeschke Do you think this should go into 24.08?

I'm not sure how often users call cudf.DataFrame with a __cuda_array_interface__ that is not a cupy.ndarray. I suppose to be on the safer side to avoid performance regressions we should target for 24.08

@bdice
Copy link
Contributor

bdice commented Jul 31, 2024

I am okay with retargeting this to 24.08 to avoid perf regressions.

@mroeschke mroeschke changed the base branch from branch-24.10 to branch-24.08 July 31, 2024 17:55
@mroeschke mroeschke requested review from a team as code owners July 31, 2024 17:55
@mroeschke mroeschke requested review from AyodeAwe, vyasr and vuule and removed request for a team July 31, 2024 17:55
@bdice
Copy link
Contributor

bdice commented Jul 31, 2024

@mroeschke You'll need to rebase to remove the branch-24.10 changes.

@jakirkham
Copy link
Member

Think we need an admin merge

@raydouglass raydouglass merged commit 445a75f into rapidsai:branch-24.08 Aug 1, 2024
78 of 80 checks passed
@mroeschke mroeschke deleted the perf/interface branch August 1, 2024 15:51
@jakirkham
Copy link
Member

Thanks all! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Slowdown in constructing a cudf dataframe from a numba device array
8 participants