Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to create an IntervalIndex from a list of Interval scalars #13952

Closed
galipremsagar opened this issue Aug 25, 2023 · 0 comments · Fixed by #13956
Closed

[BUG] Unable to create an IntervalIndex from a list of Interval scalars #13952

galipremsagar opened this issue Aug 25, 2023 · 0 comments · Fixed by #13956
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@galipremsagar
Copy link
Contributor

Steps/Code to reproduce bug

In [2]: import numpy as np

In [3]: from pandas._libs.interval import Interval

In [4]: x = [np.nan, Interval(2.0, 3.0, closed='right'), Interval(3.0, 4.0, closed='right')]

In [5]: import pandas as pd

In [6]: pd.Index(x)
Out[6]: IntervalIndex([nan, (2.0, 3.0], (3.0, 4.0]], dtype='interval[float64, right]')

In [7]: import cudf
cudf.
In [8]: cudf.Index(x)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/column.py:2268, in as_column(arbitrary, nan_as_null, dtype, length)
   2266 try:
   2267     data = as_column(
-> 2268         memoryview(arbitrary), dtype=dtype, nan_as_null=nan_as_null
   2269     )
   2270 except TypeError:

TypeError: memoryview: a bytes-like object is required, not 'list'

During handling of the above exception, another exception occurred:

ArrowInvalid                              Traceback (most recent call last)
File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/column.py:2396, in as_column(arbitrary, nan_as_null, dtype, length)
   2392         pa_type = np_to_pa_dtype(
   2393             _maybe_convert_to_default_type("float")
   2394         )
-> 2396 pyarrow_array = pa.array(
   2397     arbitrary,
   2398     type=pa_type,
   2399     from_pandas=True if nan_as_null is None else nan_as_null,
   2400 )
   2402 if (
   2403     isinstance(arbitrary, pd.Index)
   2404     and arbitrary.dtype == cudf.dtype("object")
   (...)
   2407     )
   2408 ):

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/array.pxi:327, in pyarrow.lib.array()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/array.pxi:39, in pyarrow.lib._sequence_to_array()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/error.pxi:100, in pyarrow.lib.check_status()

ArrowInvalid: Could not convert Interval(2.0, 3.0, closed='right') with type pandas._libs.interval.Interval: did not recognize Python value type when inferring an Arrow data type

During handling of the above exception, another exception occurred:

ArrowInvalid                              Traceback (most recent call last)
Cell In[8], line 1
----> 1 cudf.Index(x)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/nvtx/nvtx.py:101, in annotate.__call__.<locals>.inner(*args, **kwargs)
     98 @wraps(func)
     99 def inner(*args, **kwargs):
    100     libnvtx_push_range(self.attributes, self.domain.handle)
--> 101     result = func(*args, **kwargs)
    102     libnvtx_pop_range(self.domain.handle)
    103     return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/index.py:3441, in Index.__new__(cls, data, dtype, copy, name, tupleize_cols, nan_as_null, **kwargs)
   3436 if tupleize_cols is not True:
   3437     raise NotImplementedError(
   3438         "tupleize_cols != True is not yet supported"
   3439     )
-> 3441 return as_index(
   3442     data,
   3443     copy=copy,
   3444     dtype=dtype,
   3445     name=name,
   3446     nan_as_null=nan_as_null,
   3447     **kwargs,
   3448 )

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/nvtx/nvtx.py:101, in annotate.__call__.<locals>.inner(*args, **kwargs)
     98 @wraps(func)
     99 def inner(*args, **kwargs):
    100     libnvtx_push_range(self.attributes, self.domain.handle)
--> 101     result = func(*args, **kwargs)
    102     libnvtx_pop_range(self.domain.handle)
    103     return result

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/index.py:3340, in as_index(arbitrary, nan_as_null, **kwargs)
   3337 elif isinstance(arbitrary, cudf.DataFrame):
   3338     return cudf.MultiIndex.from_frame(arbitrary)
   3339 return as_index(
-> 3340     column.as_column(
   3341         arbitrary, dtype=kwargs.get("dtype", None), nan_as_null=nan_as_null
   3342     ),
   3343     **kwargs,
   3344 )

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/column.py:2443, in as_column(arbitrary, nan_as_null, dtype, length)
   2437     return cudf.core.column.ListColumn.from_sequences(
   2438         arbitrary
   2439     )
   2440 elif isinstance(arbitrary, abc.Iterable) or isinstance(
   2441     arbitrary, abc.Sequence
   2442 ):
-> 2443     data = as_column(
   2444         _construct_array(arbitrary, dtype),
   2445         dtype=dtype,
   2446         nan_as_null=nan_as_null,
   2447     )
   2448 else:
   2449     raise e

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/cudf/core/column/column.py:2171, in as_column(arbitrary, nan_as_null, dtype, length)
   2169         data = data.astype(dtype)
   2170 elif arb_dtype.kind in ("O", "U"):
-> 2171     data = as_column(pa.array(arbitrary), dtype=arbitrary.dtype)
   2172     # There is no cast operation available for pa.Array from int to
   2173     # str, Hence instead of handling in pa.Array block, we
   2174     # will have to type-cast here.
   2175     if dtype is not None:

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/array.pxi:323, in pyarrow.lib.array()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/array.pxi:83, in pyarrow.lib._ndarray_to_array()

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pyarrow/error.pxi:100, in pyarrow.lib.check_status()

ArrowInvalid: Could not convert Interval(2.0, 3.0, closed='right') with type pandas._libs.interval.Interval: tried to convert to double

Expected behavior

In [6]: cudf.Index(x)
Out[6]: IntervalIndex([nan, (2.0, 3.0], (3.0, 4.0]], dtype='interval[float64, right]')

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]
@galipremsagar galipremsagar added bug Something isn't working Python Affects Python cuDF API. labels Aug 25, 2023
@galipremsagar galipremsagar self-assigned this Aug 25, 2023
rapids-bot bot pushed a commit that referenced this issue Aug 25, 2023
…dex` (#13956)

closes #13952 

This PR fixes an issue with `IntervalColumn` construction where we can utilize the existing type inference to create a pandas Series and then construct an `IntervalColumn` out of it since pyarrow is unable to read this kind of input correctly.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #13956
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant