-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cuDF cannot create list of struct dataframe using dict or from pandas #7561
Comments
This issue has been labeled |
RCA: the issue is that during roundtrip to libcudf, field name information is lost, and column index is used to construct the field name. (Note that non-nested struct column preserves the field name because it goes a different code path) cudf/python/cudf/cudf/_lib/types.pyx Line 189 in 667b9bc
To resolve this, cudf need to pass along the field names from pyarrow column to reconstructed cudf column. |
RCA Update: synced with @shwina offline, inside cudf type system there is |
Closes #7561 This PR makes sure upon constructing cudf object, nested types from the pyarrow array is copied to cudf object. This should handle arbitrary nesting of `Lists`, `Structs`. For decimal types, precision is copied from the array. Authors: - Michael Wang (https://github.com/isVoid) - Keith Kraus (https://github.com/kkraus14) Approvers: - Keith Kraus (https://github.com/kkraus14) URL: #8244
cuDF loses the field names of the struct inside a list when creating from a dict:
or from pandas:
The expected dataframe has field names:
The text was updated successfully, but these errors were encountered: