You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import cudf
import pandas as pd
s = cudf.Series([[["abc", "def"]]])
expect = pd.Series([[["abc", "def"]]])
got = s.to_pandas()
I would expect that the entries of got.values are the same as expect.values. However, type(got.values[0]) is numpy.ndarray whereas type(expect.values[0]) is list.
In and of itself this doesn't seem to be a big problem, since cudf.from_pandas(expect) produces the same as cudf.from_pandas(got). But, IO of the two pandas objects (specifically CSV IO) produces different results, since print of a numpy.ndarray produces something different to printing a list.
Consequently, I now can't round-trip through to_csv/read_csv
The way to_pandas() is implemented, we just go via arrow and then call to_pandas() on the arrow array. Plausibly this is therefore a bug in arrow, but I am not sure, converting a ListArray to a series of numpy array objects. I suspect that, at least in the short term, we'll need to patch up ListColumn.to_pandas() to post-process the result from arrow.
The text was updated successfully, but these errors were encountered:
Describe the bug
I would expect that the entries of
got.values
are the same asexpect.values
. However,type(got.values[0])
isnumpy.ndarray
whereastype(expect.values[0])
islist
.In and of itself this doesn't seem to be a big problem, since
cudf.from_pandas(expect)
produces the same ascudf.from_pandas(got)
. But, IO of the two pandas objects (specifically CSV IO) produces different results, sinceprint
of a numpy.ndarray produces something different to printing a list.Consequently, I now can't round-trip through
to_csv/read_csv
Expected behavior
I think this should work.
The way
to_pandas()
is implemented, we just go via arrow and then callto_pandas()
on the arrow array. Plausibly this is therefore a bug in arrow, but I am not sure, converting aListArray
to a series of numpy array objects. I suspect that, at least in the short term, we'll need to patch upListColumn.to_pandas()
to post-process the result from arrow.The text was updated successfully, but these errors were encountered: