-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decimal128
Support for to/from_arrow
#9986
decimal128
Support for to/from_arrow
#9986
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #9986 +/- ##
================================================
- Coverage 10.49% 10.39% -0.10%
================================================
Files 119 119
Lines 20305 20064 -241
================================================
- Hits 2130 2086 -44
+ Misses 18175 17978 -197
Continue to review full report at Codecov.
|
@galipremsagar Are you able to work on the Python changes now? Looks like cuDF Python doesn't currently have support for Decimal128DType, and that will most likely need to be added. |
Yup, this PR unblocks #9533.
Update: Python changes ready: #9533 |
|
These are unrelated failures: #7314 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just some copyrights and a question.
9000a26
to
f74414d
Compare
[[nodiscard]] auto make_decimal128_arrow_array(std::vector<T> const& data, | ||
std::optional<std::vector<int>> const& validity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly a nit, but I'd prefer to see these be iterators, or at least spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the .data()
I am not sure I can use iterators here. And cudf::host_span
doesn't work with std::vector
.
Resolves: #10031 Depends on #9483, #9986 Note: The CI for this PR is not going to pass until #9986 is admin-merged(Admin merge needed since #9986 requires this PR changes too). - [x] Introduced `Decimal128Dtype` and `Decimal128Column`. - [x] Enabled python side support for the above both. - [x] Enables complete support for `Decimal32Column` which is currently lacking. - [x] Enabled orc writer to use decimal128. - [x] Enabled parquet to read a decimal128 type. - [x] Enabled Scalar support for `Decimal128Dtype`. - [x] Covered all decimal types in `string` <-> `decimal` conversions. - [x] **Made `Decimal128Dtype` the default type while reading in a Decimal Series or Scalar. User can specify to choose a specific decimal type by passing a `dtype`.** (Breaking) - [x] **Fixed issues in the binop precision & scale calculation logic to correctly choose a decimal type.** (Breaking) - [x] Fixed type metadata handling issues seen across APIs while making changes. - [x] Added parametrizations for all missing `decimal32` tests. - [x] Added parametrizations for `decimal128` along with existing decimal type-specific tests. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) - Robert (Bobby) Evans (https://github.com/revans2) - Conor Hoekstra (https://github.com/codereport) Approvers: - Devavret Makkar (https://github.com/devavret) - Vyas Ramasubramani (https://github.com/vyasr) URL: #9533
After #9986 reading Arrow in libcudf now returns DECIMAL128 instead of DECIMAL64. This updates the Java tests to expect DECIMAL128 instead of DECIMAL64 by upcasting the decimal columns in the original table being round-tripped through Arrow before comparing the result. Authors: - Jason Lowe (https://github.com/jlowe) Approvers: - Rong Ou (https://github.com/rongou) - MithunR (https://github.com/mythrocks) - Nghia Truong (https://github.com/ttnghia) URL: #10073
Resolves C++ side of #9980.
The reason this PR is breaking is because Arrow only has a notion of
decimal128
(seearrow::Type::DECIMAL
). We can still support bothdecimal64
anddecimal128
forto_arrow
but forfrom_arrow
it only makes sense to support one of them, anddecimal128
(now that we have it) is the logical choice. Therfore, the switching of the return type of a column comingfrom_arrow
fromdecimal64
todecimal128
is a breaking change.Requires:
decimal128
in cudf python #9533