-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unable to retrieve Decimal
types correctly when read via pandas.read_parquet
#10172
Comments
Decimal
types correct when read via pandas.read_parquet
Decimal
types correctly when read via pandas.read_parquet
This is a metadata problem. The pandas json metadata for the column says Need to figure out why |
I think we need to add https://github.com/rapidsai/cudf/blob/branch-22.04/python/cudf/cudf/_lib/utils.pyx#L197-L198 |
cudf/python/cudf/cudf/_lib/utils.pyx Lines 189 to 196 in 05dd541
cudf/python/cudf/cudf/utils/dtypes.py Lines 40 to 51 in 8d7330f
A better fix would be to fix that map but I didn't do it for "list" so we can do the quick fix now also |
Fixes: #10172 `pa.pandas_compat.construct_metadata` constructs the correct metadata but is being overridden by special `list` & `struct` handling logic as `string`, rather than retaining it as `object`. This PR fixes the issue and modifies existing tests to validate the issue. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - https://github.com/brandon-b-miller - Devavret Makkar (https://github.com/devavret) URL: #10224
Describe the bug
It first appears to be a
pandas
bug but after a detailed comparison withpyarrow
parquet writer, it appears to be acudf
parquet writer issue.Steps/Code to reproduce bug
Expected behavior
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: