Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Document limitation using cudf.pandas proxy arrays #16955

Merged
15 changes: 15 additions & 0 deletions docs/cudf/source/cudf_pandas/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,21 @@ There are a few known limitations that you should be aware of:
```
- `cudf.pandas` (and cuDF in general) is only compatible with pandas 2. Version
24.02 of cudf was the last to support pandas 1.5.x.
- In order for `cudf.pandas` to produce a proxy array that ducktypes as a `np.ndarray`, we actually have to wrap a valid `np.ndarray` and cannot keep the data on device with a `cupy` array. This approach incurs the overhead of an initial device-to-host (DtoH) transfer when creating a proxy array. For example,

```python
import pandas as pd
Matt711 marked this conversation as resolved.
Show resolved Hide resolved
import numpy as np

arr = pd.DataFrame("a":range(10)).values # implicit DtoH transfer
isinstance(arr, np.ndarrray) # returns True
```
The reason why we do the data transfer from device to host is to ensure that the [data buffer](https://numpy.org/doc/stable/dev/internals.html#internal-organization-of-numpy-arrays) is set correctly. With the data buffer set, we can utilize other functions which require a valid data buffer.

```python
import torch
x = torch.from_numpy(arr)
```
vyasr marked this conversation as resolved.
Show resolved Hide resolved

## Can I force running on the CPU?

Expand Down
Loading