Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Do not convert decimal32/64 cols to decimal128 in to_arrow API and PQ writer when arrow schema is in use #17080

Open
mhaseeb123 opened this issue Oct 14, 2024 · 1 comment · May be fixed by #17869
Labels
1 - On Deck To be worked on next feature request New feature or request

Comments

@mhaseeb123
Copy link
Member

mhaseeb123 commented Oct 14, 2024

Is your feature request related to a problem? Please describe.
We currently convert decimal32 and decimal64 columns to decimal128 using #16236 whenever converting to arrow table via to_arrow or to_parquet with store_schema=True. apache/arrow#43956 (when completes) will the add the support for decimal32 and decimal64 types in arrow as well making it unnecessary to to the conversion at libcudf side and hence should be removed.

Describe the solution you'd like
Remove the conversion from d32 and d64 cols to d128 and directly write parquet or convert to_arrow

Describe alternatives you've considered
Keep converting to d128 but not needed soon.

Additional context
Blocked on the completion of apache/arrow#43956

@mhaseeb123 mhaseeb123 added feature request New feature or request 0 - Blocked Cannot progress due to external reasons labels Oct 14, 2024
@zeroshade
Copy link
Contributor

With the release of Arrow v18, i'll add implementations of Decimal32/Decimal64 to nanoarrow which will then allow avoiding the conversions.

@mhaseeb123 mhaseeb123 added 1 - On Deck To be worked on next and removed 0 - Blocked Cannot progress due to external reasons labels Jan 22, 2025
rapids-bot bot pushed a commit that referenced this issue Jan 29, 2025
…PIs (#17422)

Now that the Arrow format includes `Decimal32` and `Decimal64` data types, CUDF no longer needs to convert them to decimal128 when importing/exporting values via the `to_arrow` and `from_arrow` APIs. Instead we can just treat them like any other fixed-width data type and use the buffers directly.

This doesn't fully address #17080 as it doesn't make any changes to the Parquet side of things

This also incorporates the changes from #17405 which are needed for debug tests. That should get merged first, and then I can rebase this.

Authors:
  - Matt Topol (https://github.com/zeroshade)
  - David Wendt (https://github.com/davidwendt)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Paul Mattione (https://github.com/pmattione-nvidia)
  - Bradley Dice (https://github.com/bdice)
  - Lawrence Mitchell (https://github.com/wence-)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Robert (Bobby) Evans (https://github.com/revans2)
  - David Wendt (https://github.com/davidwendt)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #17422
@mhaseeb123 mhaseeb123 linked a pull request Jan 30, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - On Deck To be worked on next feature request New feature or request
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

2 participants