-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add support for decimal128
in cudf python
#9533
[REVIEW] Add support for decimal128
in cudf python
#9533
Conversation
Co-authored-by: Vyas Ramasubramani <[email protected]>
…cudf into python_decimal128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional suggestion
Co-authored-by: Vukasin Milovanovic <[email protected]>
…cudf into python_decimal128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple final small suggestions but I'm going ahead and approving right now. It looks great!
python/cudf/cudf/core/dtypes.py
Outdated
name = "decimal128" | ||
MAX_PRECISION = 38 | ||
itemsize = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we have a convention for class variables being upper or lower case, but I would suggest being consistent at least within these classes in lieu of a broader rule. It is odd to see MAX_PRECISION capitalized while itemsize is lowercase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In column.pyx
we use itemsize
attribute to calculate a columns base_size
:
@property
def base_size(self):
return int(self.base_data.size / self.dtype.itemsize)
Should we capitalize class constant to ITEMSIZE
and introduce a property in parent DecimalType
class that just returns the class constant?
@property
def itemsize(self):
return self.ITEMSIZE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the above changes, if you think it was unnecessary let me know I'll revert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it needs to be evaluated per instance per access, there's no need for a property
IMO. A lowercase attribute (for consistency with np.dtype
) is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take that back, a property
does effectively make this a read-only attribute..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I missed that numpy.dtype
has an itemsize attribute and that we needed to match that. @shwina IMO in general we shouldn't use properties over class attributes just to enforce immutability. If something really belongs to the class, not to instances, I think it is more Pythonic to put it in the class and allow users to shoot themselves in the foot by modifying it (as long as we document appropriately). That said, in this case we should match whatever numpy is doing, so @galipremsagar's solution looks fine to me.
Resolves C++ side of #9980. The reason this PR is breaking is because Arrow only has a notion of `decimal128` (see `arrow::Type::DECIMAL`). We can still support both `decimal64` **and** `decimal128` for `to_arrow` but for `from_arrow` it only makes sense to support one of them, and `decimal128` (now that we have it) is the logical choice. Therfore, the switching of the return type of a column coming `from_arrow` from `decimal64` to `decimal128` is a breaking change. Requires: * #7314 * #9533 Authors: - Conor Hoekstra (https://github.com/codereport) Approvers: - Devavret Makkar (https://github.com/devavret) - Mike Wilson (https://github.com/hyperbolic2346)
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #9533 +/- ##
================================================
- Coverage 10.49% 10.41% -0.08%
================================================
Files 119 119
Lines 20305 20541 +236
================================================
+ Hits 2130 2139 +9
- Misses 18175 18402 +227
Continue to review full report at Codecov.
|
@gpucibot merge |
rerun tests |
Resolves: #10031
Depends on #9483, #9986
Note: The CI for this PR is not going to pass until #9986 is admin-merged(Admin merge needed since #9986 requires this PR changes too).
Decimal128Dtype
andDecimal128Column
.Decimal32Column
which is currently lacking.Decimal128Dtype
.string
<->decimal
conversions.Decimal128Dtype
the default type while reading in a Decimal Series or Scalar. User can specify to choose a specific decimal type by passing adtype
. (Breaking)decimal32
tests.decimal128
along with existing decimal type-specific tests.