Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Fixed-point Decimal type support #3556

Closed
44 of 45 tasks
jrhemstad opened this issue Dec 8, 2019 · 5 comments
Closed
44 of 45 tasks

[FEA] Fixed-point Decimal type support #3556

jrhemstad opened this issue Dec 8, 2019 · 5 comments
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@jrhemstad
Copy link
Contributor

jrhemstad commented Dec 8, 2019

Is your feature request related to a problem? Please describe.
cuDF should support fixed-point decimal types.

Describe the solution you'd like
Add a new DECIMAL type for columns.

The scale information will be stored per-column in the data_type.

Also requires a new scaled_integer type for encapsulating an integral value and a scale factor that provides arithmetic operators to allow operating on the fixed point values. E.g.,

template <typename Rep>
class scaled_integer{
   // Note that the scale must be runtime info
   scaled_integer(Rep value, int32_t scale);
};

While Arrow only supports 128 bit fixed-point decimals, this design will allow us to have 32bit, 64bit, etc.

Describe alternatives you've considered
The scaled_integer type from CNL won't work because it requires the scale information to be compile time info.

Additional context
In order to support Decimal types with 128 bits of precision (like Arrow), we'll also need a 128 bit integer type.

List of PR Breakdowns:

Major PRS:

Minor PRs:

@jrhemstad jrhemstad added feature request New feature or request Needs Triage Need team to review and classify labels Dec 8, 2019
@jrhemstad jrhemstad added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Dec 8, 2019
@sameerz sameerz added the Spark Functionality that helps Spark RAPIDS label Dec 9, 2019
@codereport codereport self-assigned this Dec 16, 2019
@codereport
Copy link
Contributor

Summarising some of the design choices that were made offline with @jrhemstad and @harrism:

  • Radix type should be a compile time variable
  • Radix type should be programmed generically, but cuDF is mostly concerned with base 10
  • Constructors should support integer and float to start
  • Constructors shouldn't allow implicit conversions (mark explicit)
  • Arithmetic operators should only support scaled_integer types
  • Want to support exponent types with different values for arithmetic operations unless it proves to be too difficult
  • Use arrow for a reference: https://arrow.apache.org/docs/cpp/classarrow_1_1_decimal128.html

Updated class might look something like:

using scale_type = int32_t;

// Rep = representative type
template <typename Rep, typename Radix>
class scaled_integer {
    template <typename T,
              typename std::enable_if_t<std::is_same<T, float>::value>* = nullptr>
    scaled_integer(T value, scale_type scale) {
        // implementation for int
    }

    template <typename T,
              typename std::enable_if_t<std::is_same<T, int>::value>* = nullptr>
    scaled_integer(T value, scale_type scale) {
        // implementation for float
    }
};

and then:

using decimal32 = scaled_integer<int32_t, 10>;

@harrism
Copy link
Member

harrism commented Dec 19, 2019

Do you need the SFINAE shown here, or can you just specialize the template constructors (since the specifializations are for specific single types and they are not partial specializations)?

@jrhemstad
Copy link
Contributor Author

Do you need the SFINAE shown here, or can you just specialize the template constructors (since the specifializations are for specific single types and they are not partial specializations)?

Yeah, if you're only interested in int/float then just use overloads:

   scaled_integer(int value, scale_type scale) {
        // implementation for int
    }

    scaled_integer(float value, scale_type scale) {
        // implementation for float
    }

That would require 6 overloads (int8,16,32,64,float32,64).

I'd do SFINAE on traits like is_integer/is_floating_point

rapids-bot bot pushed a commit that referenced this issue Dec 17, 2020
This PR resolves a part of #3556.

Supporting `cudf::reduce`:
1. Part 1 (`MIN`, `MAX`, `SUM` & `PRODUCT` & `NUNIQUE`) #6814
2. Part 2 (the rest) ◀️ 

**Reduction Ops:**

**Done in Previous PR**
✔️  `SUM,             ///< sum reduction`
✔️ `PRODUCT,         ///< product reduction`
✔️ `MIN,             ///< min reduction`
✔️ `MAX,             ///< max reduction`
✔️ `NUNIQUE,         ///< count number of unique elements`

**Not supported by `cudf::reduce`:**
* [x] `COUNT_VALID,     ///< count number of valid elements`
* [x] `COUNT_ALL,       ///< count number of elements`
* [x] `COLLECT,         ///< collect values into a list`
* [x] `LEAD,            ///< window function, accesses row at specified offset following current row`
* [x] `LAG,             ///< window function, accesses row at specified offset preceding current row`
* [x] `PTX,             ///< PTX UDF based reduction`
* [x] `CUDA             ///< CUDA UDf based reduction`
* [x] `ARGMAX,          ///< Index of max element`
* [x] `ARGMIN,          ///< Index of min element`
* [x] `ROW_NUMBER,      ///< get row-number of element`

**Won't be supported:**
* [x] `ANY,             ///< any reduction`
* [x] `ALL,             ///< all reduction`

**To Do / Investigate:**
* [x] `SUM_OF_SQUARES,  ///< sum of squares reduction`
* [x] `MEDIAN,          ///< median reduction`
* [x] `QUANTILE,        ///< compute specified quantile(s)`
* [x] `NTH_ELEMENT,     ///< get the nth element`

**Deferred until requested**
* [x] `MEAN,            ///< arithmetic mean reduction`
* [x] `VARIANCE,        ///< groupwise variance`
* [x] `STD,             ///< groupwise standard deviation`

Authors:
  - Conor Hoekstra <[email protected]>

Approvers:
  - null
  - Karthikeyan
  - David

URL: #6980
rapids-bot bot pushed a commit that referenced this issue Dec 31, 2020
This PR resolves a part of #3556.

Aggregation ops supported:
* `MIN`
* `MAX`
* `COUNT` (both `null_policy` - `EX/INCLUDE`)
* `LEAD`
* `LAG`

**To Do List:**
* [x] Basic unit tests
* [x] Comprehensive unit tests
* [x] Implementation
* [x] Figure out which rolling ops to suppport

Authors:
  - Conor Hoekstra <[email protected]>

Approvers:
  - Vukasin Milovanovic
  - Ram (Ramakrishna Prabhu)

URL: #7037
rapids-bot bot pushed a commit that referenced this issue Jan 4, 2021
Adding support for `cudf::scan` for `decimal32` and `decimal64`. `cudf::scan` only supports 4 operations (sum, product, min and max) but the decimal types will only support `SUM`, `MAX` and `MIN`.

This PR resolves a part of #3556.

Authors:
  - Conor Hoekstra <[email protected]>

Approvers:
  - Jake Hemstad
  - Mark Harris

URL: #7063
rapids-bot bot pushed a commit that referenced this issue Jan 20, 2021
)

This PR resolves a part of #3556.

I decided to push the changes for sort `cudf::group_by` and hash `group_by` in different PRs.

Authors:
  - Conor Hoekstra (@codereport)

Approvers:
  - Ram (Ramakrishna Prabhu) (@rgsl888prabhu)
  - Karthikeyan (@karthikeyann)

URL: #7169
rapids-bot bot pushed a commit that referenced this issue Feb 5, 2021
)

Follow up PR to #7169

This PR resolves a part of #3556.

Authors:
  - Conor Hoekstra (@codereport)

Approvers:
  - David (@davidwendt)
  - Jake Hemstad (@jrhemstad)
  - Devavret Makkar (@devavret)

URL: #7190
@kkraus14
Copy link
Collaborator

kkraus14 commented May 4, 2021

Looks like the only thing missing here is the mod / pmod operators and the cudf::grouped_rolling_window support. We already have a separate issue for the mod / pmod operators, so I think we should raise an issue for cudf::grouped_rolling_window and then we can close this out and follow up with any gaps in subsequent issues.

@kkraus14
Copy link
Collaborator

kkraus14 commented May 4, 2021

Closing as I've opened #8161 for tracking cudf::grouped_rolling_window libcudf support.

@kkraus14 kkraus14 closed this as completed May 4, 2021
ajschmidt8 pushed a commit that referenced this issue Nov 16, 2021
Fixes #9597
Fixes #9565

Previously, `fixed_point` along with `decimal32` and `decimal64` were added to support DecimalType (see #3556 for a list of major and minor PRs). With [support for `__int128_t` now in CUDA 11.5](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-general-new-features), we can support `decimal128.` This PR enables `decimal128`.

Authors:
  - Conor Hoekstra (https://github.com/codereport)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Mark Harris (https://github.com/harrism)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Jake Hemstad (https://github.com/jrhemstad)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

No branches or pull requests

5 participants