[FEA] Fixed-point Decimal type support #3556

jrhemstad · 2019-12-08T21:01:46Z

Is your feature request related to a problem? Please describe.
cuDF should support fixed-point decimal types.

Describe the solution you'd like
Add a new DECIMAL type for columns.

The scale information will be stored per-column in the data_type.

Also requires a new scaled_integer type for encapsulating an integral value and a scale factor that provides arithmetic operators to allow operating on the fixed point values. E.g.,

template <typename Rep>
class scaled_integer{
   // Note that the scale must be runtime info
   scaled_integer(Rep value, int32_t scale);
};

While Arrow only supports 128 bit fixed-point decimals, this design will allow us to have 32bit, 64bit, etc.

Describe alternatives you've considered
The scaled_integer type from CNL won't work because it requires the scale information to be compile time info.

Additional context
In order to support Decimal types with 128 bits of precision (like Arrow), we'll also need a 128 bit integer type.

List of PR Breakdowns:

Major PRS:

Minor PRs:

The text was updated successfully, but these errors were encountered:

codereport · 2019-12-19T01:38:15Z

Summarising some of the design choices that were made offline with @jrhemstad and @harrism:

Radix type should be a compile time variable
Radix type should be programmed generically, but cuDF is mostly concerned with base 10
Constructors should support integer and float to start
Constructors shouldn't allow implicit conversions (mark explicit)
Arithmetic operators should only support scaled_integer types
Want to support exponent types with different values for arithmetic operations unless it proves to be too difficult
Use arrow for a reference: https://arrow.apache.org/docs/cpp/classarrow_1_1_decimal128.html

Updated class might look something like:

using scale_type = int32_t;

// Rep = representative type
template <typename Rep, typename Radix>
class scaled_integer {
    template <typename T,
              typename std::enable_if_t<std::is_same<T, float>::value>* = nullptr>
    scaled_integer(T value, scale_type scale) {
        // implementation for int
    }

    template <typename T,
              typename std::enable_if_t<std::is_same<T, int>::value>* = nullptr>
    scaled_integer(T value, scale_type scale) {
        // implementation for float
    }
};

and then:

using decimal32 = scaled_integer<int32_t, 10>;

harrism · 2019-12-19T02:10:02Z

Do you need the SFINAE shown here, or can you just specialize the template constructors (since the specifializations are for specific single types and they are not partial specializations)?

jrhemstad · 2019-12-19T14:34:09Z

Do you need the SFINAE shown here, or can you just specialize the template constructors (since the specifializations are for specific single types and they are not partial specializations)?

Yeah, if you're only interested in int/float then just use overloads:

   scaled_integer(int value, scale_type scale) {
        // implementation for int
    }

    scaled_integer(float value, scale_type scale) {
        // implementation for float
    }

That would require 6 overloads (int8,16,32,64,float32,64).

I'd do SFINAE on traits like is_integer/is_floating_point

This PR resolves a part of #3556. Supporting `cudf::reduce`: 1. Part 1 (`MIN`, `MAX`, `SUM` & `PRODUCT` & `NUNIQUE`) #6814 2. Part 2 (the rest) ◀️ **Reduction Ops:** **Done in Previous PR** ✔️ `SUM, ///< sum reduction` ✔️ `PRODUCT, ///< product reduction` ✔️ `MIN, ///< min reduction` ✔️ `MAX, ///< max reduction` ✔️ `NUNIQUE, ///< count number of unique elements` **Not supported by `cudf::reduce`:** * [x] `COUNT_VALID, ///< count number of valid elements` * [x] `COUNT_ALL, ///< count number of elements` * [x] `COLLECT, ///< collect values into a list` * [x] `LEAD, ///< window function, accesses row at specified offset following current row` * [x] `LAG, ///< window function, accesses row at specified offset preceding current row` * [x] `PTX, ///< PTX UDF based reduction` * [x] `CUDA ///< CUDA UDf based reduction` * [x] `ARGMAX, ///< Index of max element` * [x] `ARGMIN, ///< Index of min element` * [x] `ROW_NUMBER, ///< get row-number of element` **Won't be supported:** * [x] `ANY, ///< any reduction` * [x] `ALL, ///< all reduction` **To Do / Investigate:** * [x] `SUM_OF_SQUARES, ///< sum of squares reduction` * [x] `MEDIAN, ///< median reduction` * [x] `QUANTILE, ///< compute specified quantile(s)` * [x] `NTH_ELEMENT, ///< get the nth element` **Deferred until requested** * [x] `MEAN, ///< arithmetic mean reduction` * [x] `VARIANCE, ///< groupwise variance` * [x] `STD, ///< groupwise standard deviation` Authors: - Conor Hoekstra <[email protected]> Approvers: - null - Karthikeyan - David URL: #6980

This PR resolves a part of #3556. Aggregation ops supported: * `MIN` * `MAX` * `COUNT` (both `null_policy` - `EX/INCLUDE`) * `LEAD` * `LAG` **To Do List:** * [x] Basic unit tests * [x] Comprehensive unit tests * [x] Implementation * [x] Figure out which rolling ops to suppport Authors: - Conor Hoekstra <[email protected]> Approvers: - Vukasin Milovanovic - Ram (Ramakrishna Prabhu) URL: #7037

Adding support for `cudf::scan` for `decimal32` and `decimal64`. `cudf::scan` only supports 4 operations (sum, product, min and max) but the decimal types will only support `SUM`, `MAX` and `MIN`. This PR resolves a part of #3556. Authors: - Conor Hoekstra <[email protected]> Approvers: - Jake Hemstad - Mark Harris URL: #7063

@codereport

) This PR resolves a part of #3556. I decided to push the changes for sort `cudf::group_by` and hash `group_by` in different PRs. Authors: - Conor Hoekstra (@codereport) Approvers: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Karthikeyan (@karthikeyann) URL: #7169

@codereport

) Follow up PR to #7169 This PR resolves a part of #3556. Authors: - Conor Hoekstra (@codereport) Approvers: - David (@davidwendt) - Jake Hemstad (@jrhemstad) - Devavret Makkar (@devavret) URL: #7190

kkraus14 · 2021-05-04T23:50:38Z

Looks like the only thing missing here is the mod / pmod operators and the cudf::grouped_rolling_window support. We already have a separate issue for the mod / pmod operators, so I think we should raise an issue for cudf::grouped_rolling_window and then we can close this out and follow up with any gaps in subsequent issues.

kkraus14 · 2021-05-04T23:53:29Z

Closing as I've opened #8161 for tracking cudf::grouped_rolling_window libcudf support.

Fixes #9597 Fixes #9565 Previously, `fixed_point` along with `decimal32` and `decimal64` were added to support DecimalType (see #3556 for a list of major and minor PRs). With [support for `__int128_t` now in CUDA 11.5](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-general-new-features), we can support `decimal128.` This PR enables `decimal128`. Authors: - Conor Hoekstra (https://github.com/codereport) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Mark Harris (https://github.com/harrism) - AJ Schmidt (https://github.com/ajschmidt8) - Jake Hemstad (https://github.com/jrhemstad)

jrhemstad added feature request New feature or request Needs Triage Need team to review and classify labels Dec 8, 2019

jrhemstad mentioned this issue Dec 8, 2019

[FEA] Decimal datatype support for cuDF and from_arrow support #2888

Closed

jrhemstad added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Dec 8, 2019

harrism mentioned this issue Dec 9, 2019

[FEA] signed 128-bit Integer Type #3558

Closed

sameerz added the Spark Functionality that helps Spark RAPIDS label Dec 9, 2019

harrism assigned jrhemstad Dec 11, 2019

randerzander mentioned this issue Dec 12, 2019

[FEA] Allow user to specify whether Decimal types in ORC files are read as float64 or int64 types #3306

Closed

codereport self-assigned this Dec 16, 2019

codereport mentioned this issue Jan 14, 2020

[REVIEW] Add fixed_point class to support DecimalType #3782

Merged

17 tasks

revans2 mentioned this issue Jan 15, 2020

[FEA] Support HALF_UP and HALF_EVEN rounding to a set number of decimal places #3790

Closed

codereport mentioned this issue Feb 24, 2020

[FEA] Hook up cudf::data_type with numeric::fixed_point #4238

Closed

revans2 mentioned this issue May 29, 2020

[FEA] Support decimal type NVIDIA/spark-rapids#42

Closed

27 tasks

This was referenced Jul 31, 2020

[REVIEW] Enable more fixed_point unit tests by introducing "scale-less" constructor [skip ci] #5817

Merged

[REVIEW] fixed_point Column Optimization (store scale in data_type) [skip ci] #5861

Merged

This was referenced Oct 10, 2020

[FEA] type casting between fixed-point decimal and other numeric types #6486

Closed

[FEA] Java bindings for fixed-point Decimal type #6515

Closed

codereport mentioned this issue Oct 14, 2020

[REVIEW] Enable fixed_point binary operations #6528

Merged

4 tasks

harrism unassigned jrhemstad Dec 7, 2020

This was referenced Dec 10, 2020

Implement cudf::reduce for decimal32 and decimal64 (part 2) #6966

Closed

Implement cudf::reduce for decimal32 and decimal64 (part 2) #6980

Merged

codereport mentioned this issue Dec 18, 2020

Implement cudf::rolling for decimal32 and decimal64 #7037

Merged

4 tasks

codereport mentioned this issue Jan 4, 2021

cudf::scan support for decimal32 and decimal64 #7063

Merged

This was referenced Jan 11, 2021

[FEA] Provide fixed-point support for NULL_MIN and NULL_MAX #7115

Closed

[FEA] Fixed-point support for to_arrow and from_arrow #7116

Closed

[FEA] Fixed-point precision support for SUM in a window operation #7117

Closed

nartal1 mentioned this issue Jan 12, 2021

[FEA] Provide fixed-point support for MOD, PMOD and TRUE_DIV #7132

Closed

codereport mentioned this issue Jan 19, 2021

Implement cudf::group_by (sort) for decimal32 and decimal64 #7169

Merged

codereport mentioned this issue Jan 22, 2021

Implement cudf::group_by (hash) for decimal32 and decimal64 #7190

Merged

codereport mentioned this issue Mar 25, 2021

[REVIEW] Support groupby operations for decimal dtypes #7731

Merged

kkraus14 mentioned this issue May 4, 2021

[FEA] Fixed-point decimal support for cudf::grouped_rolling_window #8161

Closed

kkraus14 closed this as completed May 4, 2021

codereport mentioned this issue Oct 20, 2021

Add support for decimal128 #9483

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Fixed-point Decimal type support #3556

[FEA] Fixed-point Decimal type support #3556

jrhemstad commented Dec 8, 2019 •

edited by codereport

Loading

codereport commented Dec 19, 2019

harrism commented Dec 19, 2019

jrhemstad commented Dec 19, 2019

kkraus14 commented May 4, 2021

kkraus14 commented May 4, 2021

[FEA] Fixed-point Decimal type support #3556

[FEA] Fixed-point Decimal type support #3556

Comments

jrhemstad commented Dec 8, 2019 • edited by codereport Loading

List of PR Breakdowns:

Major PRS:

Minor PRs:

codereport commented Dec 19, 2019

harrism commented Dec 19, 2019

jrhemstad commented Dec 19, 2019

kkraus14 commented May 4, 2021

kkraus14 commented May 4, 2021

jrhemstad commented Dec 8, 2019 •

edited by codereport

Loading