Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Raise informative error when interchanging non-cudf dataframes #11245

Closed
honno opened this issue Jul 12, 2022 · 1 comment · Fixed by #11392
Closed

[FEA] Raise informative error when interchanging non-cudf dataframes #11245

honno opened this issue Jul 12, 2022 · 1 comment · Fixed by #11392
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@honno
Copy link

honno commented Jul 12, 2022

If one were to interchange with non-cudf dataframes via cudf.core.df_protocol.from_dataframe(), we'd get an error which doesn't quite relate to the problem at hand—that cross-device interchange isn't supported/isn't fully specified.

>>> import numpy as np
>>> import vaex
>>> df = vaex.from_items(("foo", np.asarray([7, 42, 3])))
>>> df
  #    foo
  0      7
  1     42
  2      3
>>> from cudf.core.df_protocol import from_dataframe
>>> from_dataframe(df)
AttributeError: '_VaexBuffer' object has no attribute '_allow_copy'
Full traceback
AttributeError                            Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 cudf_df = from_dataframe(df)

.../cudf/core/df_protocol.py:658, in from_dataframe(df, allow_copy)
    655 if not hasattr(df, "__dataframe__"):
    656     raise ValueError("`df` does not support __dataframe__")
--> 658 return _from_dataframe(df.__dataframe__(allow_copy=allow_copy))

.../cudf/core/df_protocol.py:681, in _from_dataframe(df)
    673 col = df.get_column_by_name(name)
    675 if col.dtype[0] in (
    676     _DtypeKind.INT,
    677     _DtypeKind.UINT,
    678     _DtypeKind.FLOAT,
    679     _DtypeKind.BOOL,
    680 ):
--> 681     columns[name], _buf = _protocol_to_cudf_column_numeric(col)
    683 elif col.dtype[0] == _DtypeKind.CATEGORICAL:
    684     columns[name], _buf = _protocol_to_cudf_column_categorical(col)

.../cudf/core/df_protocol.py:717, in _protocol_to_cudf_column_numeric(col)
    715 assert buffers["data"] is not None, "data buffer should not be None"
    716 _dbuffer, _ddtype = buffers["data"]
--> 717 _check_buffer_is_on_gpu(_dbuffer)
    718 cudfcol_num = build_column(
    719     Buffer(_dbuffer.ptr, _dbuffer.bufsize),
    720     protocol_dtype_to_cupy_dtype(_ddtype),
    721 )
    722 return _set_missing_values(col, cudfcol_num), buffers

.../cudf/core/df_protocol.py:728, in _check_buffer_is_on_gpu(buffer)
    725 def _check_buffer_is_on_gpu(buffer: _CuDFBuffer) -> None:
    726     if (
    727         buffer.__dlpack_device__()[0] != _Device.CUDA
--> 728         and not buffer._allow_copy
    729     ):
    730         raise TypeError(
    731             "This operation must copy data from CPU to GPU. "
    732             "Set `allow_copy=True` to allow it."
    733         )
    735     elif buffer.__dlpack_device__()[0] != _Device.CUDA and buffer._allow_copy:

AttributeError: '_VaexBuffer' object has no attribute '_allow_copy'

We get this AttributeError because _check_buffer_is_on_gpu() wants to raise an error when non-CuDF dataframes are being interchanged, but ends up assuming behaviour from CuDF dataframes (specifically buffers) first by using the non-specified attribute _allow_copy

def _check_buffer_is_on_gpu(buffer: _CuDFBuffer) -> None:
if (
buffer.__dlpack_device__()[0] != _Device.CUDA
and not buffer._allow_copy
):
raise TypeError(
"This operation must copy data from CPU to GPU. "
"Set `allow_copy=True` to allow it."
)

The method does raise an informative error which I think fits the bill here

elif buffer.__dlpack_device__()[0] != _Device.CUDA and buffer._allow_copy:
raise NotImplementedError(
"Only cuDF/GPU dataframes are supported for now. "
"CPU (like `Pandas`) dataframes will be supported shortly."
)

so I imagine desired behaviour here would be to just raise this if buffer doesn't say instance check to _CuDFBuffer.

Assumedly 1) data apis consortium will settle on behaviour 2) cudf adopts it, and thus this is a temporary issue 😅 A fix could be nice in the meantime still, but I namely submit this issue to keep track of things.

@honno honno added Needs Triage Need team to review and classify bug Something isn't working labels Jul 12, 2022
@shwina shwina self-assigned this Jul 15, 2022
@shwina shwina added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jul 15, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Apr 28, 2023
Closes #11245. This PR fixes a bug in our code that consumes a protocol DataFrame that is backed by CPU memory.

This enables using the `from_dataframe()` function to construct DataFrames from other libraries:

```python
In [12]: pdf = pd.DataFrame({'x': [1, 2, 3], 'y': [1.0, 2.0, np.nan], 'z': ['a', 'b', 'c']})

In [13]: vdf = vaex.from_dict({'x': [1, 2, 3], 'y': [1.0, 2.0, np.nan]})

In [14]: cudf.from_dataframe(pdf, allow_copy=True)
Out[14]: 
   x    y  z
0  1  1.0  a
1  2  2.0  b
2  3  NaN  c

In [15]: cudf.from_dataframe(vdf, allow_copy=True)
Out[15]: 
   x    y
0  1  1.0
1  2  2.0
2  3  NaN
```

Authors:
  - Ashwin Srinath (https://github.com/shwina)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #11392
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants