-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unable to create a struct column from an arrow dictionary array #9179
Comments
Cudf supports constructing a In [14]: pa_struct = pa.StructArray.from_arrays([pa.array([1, 2, 3]), pa.array(['a', 'b', 'c'])], names=['num', 'str'])
In [16]: cudf.Series.from_arrow(pa_struct)
Out[16]:
0 {'num': 1, 'str': 'a'}
1 {'num': 2, 'str': 'b'}
2 {'num': 3, 'str': 'c'}
dtype: struct I believe the error you see here has to do with having In [5]: r = cudf.Series.from_arrow(pa.StructArray.from_arrays([s.to_arrow(), f], names=['a', 'd']))
In [6]: type(r)
Out[6]: cudf.core.series.Series
In [8]: r._column.children
Out[8]:
(<cudf.core.column.categorical.CategoricalColumn object at 0x7f6459ceda70>
-- dictionary:
[
1,
2,
3
]
-- indices:
[
0,
1,
2
]
dtype: category,
<cudf.core.column.numerical.NumericalColumn object at 0x7f645a5a53b0>
[
1,
2,
3
]
dtype: int64) The result is well formed. But cannot convert the column back to arrow: In [11]: r.to_arrow()
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-11-77c2df0309ec> in <module>
----> 1 r.to_arrow()
~/cudf/python/cudf/cudf/core/single_column_frame.py in to_arrow(self)
213 ]
214 """
--> 215 return self._column.to_arrow()
216
217 @property
~/cudf/python/cudf/cudf/core/column/struct.py in to_arrow(self)
68 {
69 field: child.type
---> 70 for field, child in zip(self.dtype.fields, children)
71 }
72 )
~/cudf/python/cudf/cudf/core/dtypes.py in fields(self)
289 return {
290 field.name: cudf.utils.dtypes.cudf_dtype_from_pa_type(field.type)
--> 291 for field in self._typ
292 }
293
~/cudf/python/cudf/cudf/core/dtypes.py in <dictcomp>(.0)
289 return {
290 field.name: cudf.utils.dtypes.cudf_dtype_from_pa_type(field.type)
--> 291 for field in self._typ
292 }
293
~/cudf/python/cudf/cudf/utils/dtypes.py in cudf_dtype_from_pa_type(typ)
215 return cudf.core.dtypes.Decimal64Dtype.from_arrow(typ)
216 else:
--> 217 return cudf.api.types.pandas_dtype(typ.to_pandas_dtype())
218
219
~/compose/etc/conda/cuda_11.2/envs/rapids/lib/python3.7/site-packages/pyarrow/types.pxi in pyarrow.lib.DataType.to_pandas_dtype()
NotImplementedError: dictionary<values=int64, indices=int8, ordered=0> |
This issue has been labeled |
This issue has been labeled |
Still getting error with with dask.config.set({"dataframe.backend": "cudf"}):
dcdf = dd.read_parquet('data_sample', index='IDNUM')
ERROR:
––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
--> 435 typ = cudf_dtype_from_pa_type(schema.field(col_name).type)
436 if (
437 col_name in schema.names
438 and not isinstance(typ, (cudf.ListDtype, cudf.StructDtype))
439 and isinstance(col, cudf.core.column.StringColumn)
440 ):
441 df._data[col_name] = col.astype(typ)
File [~/share/.../conda_envs/.../lib/python3.10/site-packages/cudf/utils/dtypes.py:190](...), in cudf_dtype_from_pa_type(typ)
188 return cudf.dtype("str")
189 else:
--> 190 return cudf.api.types.pandas_dtype(typ.to_pandas_dtype())
File [~/share/.../conda_envs/.../lib/python3.10/site-packages/pyarrow/types.pxi:378](...), in pyarrow.lib.DataType.to_pandas_dtype()
File [~/share/.../conda_envs/.../lib/python3.10/site-packages/pyarrow/types.pxi:183](...), in pyarrow.lib._to_pandas_dtype()
NotImplementedError: dictionary<values=string, indices=int32, ordered=0>
Error is reproducable with the following:
|
Describe the bug
We currently support converting a
category
column incudf
to aDictionaryArray
inpyarrow
:But we don't seem to be supporting the construction of
StructDtype
&StructColumn
from our constructors:Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.
Environment overview (please complete the following information)
Environment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: