-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Format] Add Decimal32 and Decimal64 to Arrow #43956
Comments
zeroshade
added a commit
to zeroshade/arrow
that referenced
this issue
Sep 4, 2024
zeroshade
added a commit
to zeroshade/arrow
that referenced
this issue
Sep 5, 2024
zeroshade
added a commit
to zeroshade/arrow
that referenced
this issue
Sep 5, 2024
zeroshade
added a commit
to zeroshade/arrow
that referenced
this issue
Sep 10, 2024
zeroshade
added a commit
that referenced
this issue
Sep 10, 2024
### Rationale for this change Widening the Decimal128/256 type to allow for bitwidths of 32 and 64 allows for more interoperability with other libraries and utilities which already support these types. This provides even more opportunities for zero-copy interactions between things such as libcudf and various databases. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ### What changes are included in this PR? Updating the documentation in Schema.fbs to explicitly state that 32-bit and 64-bit is now allowed for bitwidths of Decimal types. This is the only area in the the spec that mentions the allowed decimal bitwidths. * GitHub Issue: #43956 --------- Co-authored-by: Antoine Pitrou <[email protected]>
khwilson
pushed a commit
to khwilson/arrow
that referenced
this issue
Sep 14, 2024
…43976) ### Rationale for this change Widening the Decimal128/256 type to allow for bitwidths of 32 and 64 allows for more interoperability with other libraries and utilities which already support these types. This provides even more opportunities for zero-copy interactions between things such as libcudf and various databases. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ### What changes are included in this PR? Updating the documentation in Schema.fbs to explicitly state that 32-bit and 64-bit is now allowed for bitwidths of Decimal types. This is the only area in the the spec that mentions the allowed decimal bitwidths. * GitHub Issue: apache#43956 --------- Co-authored-by: Antoine Pitrou <[email protected]>
zeroshade
added a commit
to zeroshade/arrow
that referenced
this issue
Sep 30, 2024
…43976) Widening the Decimal128/256 type to allow for bitwidths of 32 and 64 allows for more interoperability with other libraries and utilities which already support these types. This provides even more opportunities for zero-copy interactions between things such as libcudf and various databases. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Updating the documentation in Schema.fbs to explicitly state that 32-bit and 64-bit is now allowed for bitwidths of Decimal types. This is the only area in the the spec that mentions the allowed decimal bitwidths. * GitHub Issue: apache#43956 --------- Co-authored-by: Antoine Pitrou <[email protected]>
zeroshade
added a commit
to zeroshade/arrow
that referenced
this issue
Sep 30, 2024
zeroshade
added a commit
that referenced
this issue
Sep 30, 2024
…ns (#43957) <!-- Thanks for opening a pull request! If this is your first pull request you can find detailed information on how to contribute here: * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request) * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html) If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project. Then could you also rename the pull request title in the following format? GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY} or MINOR: [${COMPONENT}] ${SUMMARY} In the case of PARQUET issues on JIRA the title also supports: PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY} --> ### Rationale for this change Widening the Decimal128/256 type to allow for bitwidths of 32 and 64 allows for more interoperability with other libraries and utilities which already support these types. This provides even more opportunities for zero-copy interactions between things such as libcudf and various databases. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ### What changes are included in this PR? This PR contains the basic C++ implementations for Decimal32/Decimal64 types, arrays, builders and scalars. It also includes the minimum necessary to get everything compiling and tests passing without also extending the acero kernels and parquet handling (both of which will be handled in follow-up PRs). <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ### Are these changes tested? Yes, tests were extended where applicable to add decimal32/decimal64 cases. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ### Are there any user-facing changes? Currently if a user is using `decimal(precision, scale)` rather than `decimal128(precision, scale)` they will get a `Decimal128Type` if the precision is <= 38 (max precision for Decimal128) and `Decimal256Type` if the precision is higher. Following the same pattern, this change means that using `decimal(precision, scale)` instead of the specific `decimal32`/`decimal64`/`decimal128`/`decimal256` functions results in the following functionality: - for precisions [1 : 9] => `Decimal32Type` - for precisions [10 : 18] => `Decimal64Type` - for precisions [19 : 38] => `Decimal128Type` - for precisions [39 : 76] => `Decimal256Type` While many of our tests currently make the assumption that `decimal` with a low precision would be `Decimal128` and had to be updated, this may cause an initial surprise if users are making the same assumptions. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking. --> <!-- **This PR includes breaking changes to public APIs.** --> <!-- Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious. --> <!-- **This PR contains a "Critical Fix".** --> * GitHub Issue: #43956 --------- Co-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]>
zeroshade
added a commit
to zeroshade/arrow
that referenced
this issue
Dec 12, 2024
zeroshade
added a commit
that referenced
this issue
Dec 13, 2024
<!-- Thanks for opening a pull request! If this is your first pull request you can find detailed information on how to contribute here: * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request) * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html) If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project. Then could you also rename the pull request title in the following format? GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY} or MINOR: [${COMPONENT}] ${SUMMARY} --> ### Rationale for this change Furthering the support for Decimal32/Decimal64 among Acero and casting functionality. This is also necessary for #44882 to add support for Decimal32/64 to PyArrow <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ### What changes are included in this PR? Adding kernels for casting to and from Decimal32/Decimal64 between numeric, floating point, string and other decimal types. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ### Are these changes tested? Yes, unit tests are added accordingly. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> * GitHub Issue: #43956
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
Instead of adding entirely new types, we should widen the existing Decimal128/256 types to allow bit widths of 32 and 64.
This will require:
Component(s)
Format
The text was updated successfully, but these errors were encountered: